Java scanner to ignore delimiter in csv in case of ""

Java scanner to ignore delimiter in csv in case of "" - java

I am using scanner to delimiting tokens by ";". I need when strings are quoted,
that ; would be ignored by scanner in quotes "". Also need delimiting by "".
0 1 2
A ProjectID Name Describtion Summary
B ID-322;"oba stb; iba logo ""T"" ";dg-eiiod
C ID-349;Sttring;dg-enc05
D ID-888;Data;dg-enc05
As you see in string "oba stb; iba logo""T"" ";is my delimiter.
I need to make scanner ignore it, now it would split it as"oba stb"iba logo ""T"" "` which I don't want.
now I have
scanner.useDelimiter(";|\t");

Don't use Scanner for parsing CSV files, use a CSV parser.
Almost all Java-based CSV parsers allow you to use delimiters other that commas.
E.g. with Apache Commons CSV (just to pick a random one):
CSVFormat format = CSVFormat.RFC4180.withDelimiter(';');
Charset charset = Charset.defaultCharset(); // or StandardCharsets.UTF_8
try (CSVParser parser = CSVParser.parse(file, charset, format)) {
for (CSVRecord record : parser) {
String projectID = record.get("ProjectID");
String name = record.get("Name");
String description = record.get("Describtion");
...
}
}

Related

why CSVWriter and CSVReader uses different default escape characters?

Here is my code snippet which I am using:
StringWriter writer = new StringWriter();
CSVWriter csvwriter = new CSVWriter(writer);
String[] originalValues = new String[2];
originalValues[0] = "t\\est";
originalValues[1] = "t\\est";
System.out.println("Original values: " + originalValues[0] +"," + originalValues[1]);
csvwriter.writeNext(originalValues);
csvwriter.close();
CSVReader csvReader = new CSVReader(new StringReader(writer.toString()));
String[] resultingValues = csvReader.readNext();
System.out.println("Resulting values: " + resultingValues[0] +"," + resultingValues[1]);
The output of the above snippet is:
Original values: t\est,t\est
Resulting values: test,test
Back slash ('\') character is gone after conversion!!!
By some basic analysis I figured that it is happening because CSVReader is using Back slash ('\') as default escape character where as CSVWriter is using double quote ('"') as default escape character.
What is the reason behind this inconsistency in default behavior?
To fix above problem I managed to find following two solutions:
1) Overwriting default escape character of CSVReader with null character:
CSVParser csvParser = new CSVParserBuilder().withEscapeChar('\0').build();
CSVReader csvReader = new CSVReaderBuilder(new StringReader(writer.toString())).withCSVParser(csvParser).build();
2) Using RFC4180Parser which strictly follows RFC4180 standards:
RFC4180Parser rfc4180Parser = new RFC4180ParserBuilder().build();
CSVReader csvReader = new CSVReaderBuilder(new StringReader(writer.toString())).withCSVParser(rfc4180Parser).build();
Can using any of the above approach cause any side effects on any other characters?
Also why RFC4180Parser is not default parser? Is it only for maintaining backward compatibility as RFC4180Parser got introduced in later versions?

I think we are looking at 2 types of escaping here.
1) Escaping the double quote in csv:
test,"Monitor 24"", Samsung"
test,"Monitor 24\", Samsung" // Linux style
Since we have a comma in the second field, that field has to be surrounded with double quotes. Any double quotes inside that field then have to be escaped, with "" or \".
2) \ is also a general escape character, for example \t (tab) or \n (newline).
And since 'e' is not in the list of characters to escape, the \ is simply ignored and removed.
So if you would write "t\\\\est" the file would contain "t\\est" (escaped backslash) and show "t\est" after reading. Or writing "\\test" would probably show a tab and "est" after reading.
To keep the \ after reading, you would indeed have to tell the parser somehow to ignore those sequences, but the current behaviour doesn't look inconsistent to me - actually they are both treating the \ as escape character.

Java OpenCSV Split by pipe limited

I am having an issue when reading from a file using comma split. I can read the file like this:
CSVReader reader = new CSVReader(new FileReader(FileName), '|' , '"' , 0);
Then when I want to get the individual values, I can read them like this:
String[] record = rowString.split(",");
The issue of course is that comma is not the most reliable way to read a file. Is there any way to split the string by pipe delimited like this?:
String[] record = rowString.split("\\|");
This is how I am reading the lines, it may possibly be in this code where I need to make such adjustment?
for(String[] row : allRows){
String rowString = Arrays.toString(row).toString();
String[] record = rowString.split(",");
}
Thank you.

I don't know if this answer the question but in my case this solve the problem:
val reader: Reader = Files.newBufferedReader(path)
val csvToBean = CsvToBeanBuilder<MyCsvSchema>(reader)
.withType(MyCsvSchema::class.java)
.withSeparator('|')
.withIgnoreLeadingWhiteSpace(true)
.build()
val list = csvToBean.parse()
This is a Kotlin code

Detect ETX characters using Java

We have a text file with multiples paragraphs each separated by
character ETX as shown in below image. The Java program should
read each paragraph and parse and the end of paragaraph should be
identified by ETX.
How to make Java recognize or detect the EXT character?

ETX is an ASCII character, designed to indicate the end of transmission.
It also has the unicode codepoint U+0003 and you can thus find it by searching your string for this codepoint. If you are working with 8-bit strings, search for "\x03".

String representation of ETX character is String.valueOf((char)3)
File file = new File("myfile.txt");
java.util.Scanner scanner = new java.util.Scanner(file);
String etx = String.valueOf((char)3);
scanner.useDelimiter(etx);
while (scanner.hasNext()) {
String paragraph = scanner.next();
System.out.println("Paragraph = " + paragraph);
}

Csv: search for String and replace with another string

I have a .csv file that contains:
scenario, custom, master_data
1, ${CUSTOM}, A_1
I have a string:
a, b, c
and I want to replace 'custom' with 'a, b, c'. How can I do that and save to the existing .csv file?

Probably the easiest way is to read in one file and output to another file as you go, modifying it on a per-line basis
You could try something with tokenizers, this may not be completely correct for your output/input, but you can adapt it to your CSV file formatting
BufferedReader reader = new BufferedReader(new FileReader("input.csv"));
BufferedWriter writer = new BufferedWriter(new FileWriter("output.csv"));
String custom = "custom";
String replace = "a, b, c";
for(String line = reader.readLine(); line != null; line = reader.readLine())
{
String output = "";
StringTokenizer tokenizer = new StringTokenizer(line, ",");
for(String token = tokenizer.nextToken(); tokenizer.hasMoreTokens(); token = tokenizer.nextToken())
if(token.equals(custom)
output = "," + replace;
else
output = "," + token;
}
readInventory.close();
If this is for a one off thing, it also has the benefit of not having to research regular expressions (which are quite powerful and useful, good to know, but maybe for a later date?)

Have a look at Can you recommend a Java library for reading (and possibly writing) CSV files?
And once the values have been read, search for strings / value that start with ${ and end with }. Use Java Regular Expressions like \$\{(\w)\}. Then use some map for looking up the found key, and the related value. Java Properties would be a good candidate.
Then write a new csv file.

Since your replacement string is quite unique you can do it quickly without complicated parsing by just reading your file into a buffer, and then converting that buffer into a string. Replace all occurrences of the text you wish to replace with your target text. Then convert the string to a buffer and write that back to the file...
Pattern.quote is required because your string is a regular expression. If you don't quote it you may run into unexpected results.
Also it's generally not smart to overwrite your source file. Best is to create a new file then delete the old and rename the new to the old. Any error halfway will then not delete all your data.
final Path yourPath = Paths.get("Your path");
byte[] buff = Files.readAllBytes(yourPath);
String s = new String(buff, Charset.defaultCharset());
s = s.replaceAll(Pattern.quote("${CUSTOM}"), "a, b, c");
Files.write(yourPath, s.getBytes());

java convert String windows-1251 to utf8

Scanner sc = new Scanner(System.in);
System.out.println("Enter text: ");
String text = sc.nextLine();
try {
String result = new String(text.getBytes("windows-1251"), Charset.forName("UTF-8"));
System.out.println(result);
} catch (UnsupportedEncodingException e) {
System.out.println(e);
}
I'm trying change keyboard: input cyrylic keyboard, output latin. Example: qwerty +> йцукен
It doesn't work, can anyone tell me what i'm doing wrong?

First java text, String/char/Reader/Writer is internally Unicode, so it can combine all scripts.
This is a major difference with for instance C/C++ where there is no such standard.
Now System.in is an InputStream for historical reasons. That needs an indication of encoding used.
Scanner sc = new Scanner(System.in, "Windows-1251");
The above explicitly sets the conversion for System.in to Cyrillic. Without this optional parameter the default encoding is taken. If that was not changed by the software, it would be the platform encoding. So this might have been correct too.
Now text is correct, containing the Cyrillic from System.in as Unicode.
You would get the UTF-8 bytes as:
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
The old "recoding" of text was wrong; drop this line. in fact not all Windows-1251 bytes are valid UTF-8 multi-byte sequences.
String result = text;
System.out.println(result);
System.out is a PrintStream, a rather rarely used historic class. It prints using the default platform encoding. More or less rely on it, that the default encoding is correct.
System.out.println(result);
For printing to an UTF-8 encoded file:
byte[] bytes = ("\uFEFF" + text).getBytes(StandardCharsets.UTF_8);
Path path = Paths.get("C:/Temp/test.txt");
Files.writeAllBytes(path, bytes);
Here I have added a Unicode BOM character in front, so Windows Notepad may recognize the encoding as UTF-8. In general one should evade using a BOM. It is a zero-width space (=invisible) and plays havoc with all kind of formats: CSV, XML, file concatenation, cut-copy-paste.

The reason why you have gotten the answer to a different question, and nobody answered yours, is because your title doesn't fit the question. You were not attempting to convert between charsets, but rather between keyboard layouts.
Here you shouldn't worry about character layout at all, simply read the line, convert it to an array of characters, go through them and using a predefined map convert these.
The code will be something like this:
Map<char, char> table = new TreeMap<char, char>();
table.put('q', 'й');
table.put('Q', 'Й');
table.put('w', 'ц');
// .... etc
String text = sc.nextLine();
char[] cArr = text.toCharArray();
for(int i=0; i<cArr.length; ++i)
{
if(table.containsKey(cArr[i]))
{
cArr[i] = table.get(cArr[i]);
}
}
text = new String(cArr);
System.out.println(text);
Now, i don't have time to test that code, but you should get the idea of how to do your task.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java scanner to ignore delimiter in csv in case of "" - java

Related

why CSVWriter and CSVReader uses different default escape characters?

Java OpenCSV Split by pipe limited

Detect ETX characters using Java

Csv: search for String and replace with another string

java convert String windows-1251 to utf8

Categories

Resources