We have a text file with multiples paragraphs each separated by
character ETX as shown in below image. The Java program should
read each paragraph and parse and the end of paragaraph should be
identified by ETX.
How to make Java recognize or detect the EXT character?
ETX is an ASCII character, designed to indicate the end of transmission.
It also has the unicode codepoint U+0003 and you can thus find it by searching your string for this codepoint. If you are working with 8-bit strings, search for "\x03".
String representation of ETX character is String.valueOf((char)3)
File file = new File("myfile.txt");
java.util.Scanner scanner = new java.util.Scanner(file);
String etx = String.valueOf((char)3);
scanner.useDelimiter(etx);
while (scanner.hasNext()) {
String paragraph = scanner.next();
System.out.println("Paragraph = " + paragraph);
}
Related
I've got a problem with input from user. I need to save input from user into binary file and when I read it and show it on the screen it isn't working properly. I dont want to put few hundreds of lines, so I will try to dexcribe it in more compact form. And encoding in NetBeans in properties of project is "UTF-8"
I got input from user, in NetBeans console or cmd console. Then I save it to object made up of strings, then add it to ArrayList<Ksiazka> where Ksiazka is my class (basically a book's properties). Then I save whole ArrayList object to file baza.bin. I do it by looping through whole list of objects of class Ksiazka, taking each String one by one and saving it into file baza.bin using method writeUTF(oneOfStrings). When I try to read file baza.bin I see question marks instead of special characters (ą, ć, ę, ł, ń, ó, ś, ź). I think there is a problem in difference in encoding of file and input data, but to be honest I don't have any idea ho to solve that.
Those are attributes of my class Ksiazka:
private String id;
private String tytul;
private String autor;
private String rok;
private String wydawnictwo;
private String gatunek;
private String opis;
private String ktoWypozyczyl;
private String kiedyWypozyczona;
private String kiedyDoOddania;
This is method for reading data from user:
static String podajDana(String[] tab, int coPokazac){
System.out.print(tab[coPokazac]);
boolean podawajDalej = true;
String linia = "";
Scanner klawiatura = new Scanner(System.in, "utf-8");
do{
try {
podawajDalej = false;
linia = klawiatura.nextLine();
}
catch(NoSuchElementException e){
System.err.println("Wystąpił błąd w czasie podawania wartości!"
+ " Spróbuj jeszcze raz!");
}
catch(IllegalStateException e){
System.err.println("Wewnętrzny błąd programu typu 2! Zgłoś to jak najszybciej"
+ " razem z tą wiadomością");
}
}while(podawajDalej);
return linia;
}
String[] tab is just array of strings I want to be able to show on the screen, each set (array) has its own function, int coPokazac is number of line from an array I want to show.
and this one saves all data from ArrayList<Ksiazka> to file baza.bin:
static void zapiszZmiany(ArrayList<Ksiazka> bazaKsiazek){
try{
RandomAccessFile plik = new RandomAccessFile("baza.bin","rw");
for(int i = 0; i < bazaKsiazek.size(); i++){
plik.writeUTF(bazaKsiazek.get(i).zwrocId());
plik.writeUTF(bazaKsiazek.get(i).zwrocTytul());
plik.writeUTF(bazaKsiazek.get(i).zwrocAutor());
plik.writeUTF(bazaKsiazek.get(i).zwrocRok());
plik.writeUTF(bazaKsiazek.get(i).zwrocWydawnictwo());
plik.writeUTF(bazaKsiazek.get(i).zwrocGatunek());
plik.writeUTF(bazaKsiazek.get(i).zwrocOpis());
plik.writeUTF(bazaKsiazek.get(i).zwrocKtoWypozyczyl());
plik.writeUTF(bazaKsiazek.get(i).zwrocKiedyWypozyczona());
plik.writeUTF(bazaKsiazek.get(i).zwrocKiedyDoOddania());
}
plik.close();
}
catch (FileNotFoundException ex){
System.err.println("Nie znaleziono pliku z bazą książek!");
}
catch (IOException ex){
System.err.println("Błąd zapisu bądź odczytu pliku!");
}
}
I think that there is a problem in one of those two methods (either I do something wrong while reading it or something wrong when it is saving data to file using writeUTF()) but even tho I tried few things to solve it, none of them worked.
After quick talk with lecturer I got information that I can use at most JDK 8.
You are using different techniques for reading and writing, and they are not compatible.
Despite the name, the writeUTF method of RandomAccessFile does not write a UTF-8 string. From the documentation:
Writes a string to the file using modified UTF-8 encoding in a machine-independent manner.
First, two bytes are written to the file, starting at the current file pointer, as if by the writeShort method giving the number of bytes to follow. This value is the number of bytes actually written out, not the length of the string. Following the length, each character of the string is output, in sequence, using the modified UTF-8 encoding for each character.
writeUTF will write a two-byte length, then write the string as UTF-8, except that '\u0000' characters are written as two UTF-8 bytes and supplementary characters are written as two UTF-8 encoded surrogates, rather than single UTF-8 codepoint sequences.
On the other hand, you are trying to read that data using new Scanner(System.in, "utf-8") and klawiatura.nextLine();. This approach is not compatible because:
The text was not written as a true UTF-8 sequence.
Before the text was written, two bytes indicating its numeric length were written. They are not readable text.
writeUTF does not write a newline. It does not write any terminating sequence at all, in fact.
The best solution is to remove all usage of RandomAccessFile and replace it with a Writer:
Writer plik = new FileWriter(new File("baza.bin"), StandardCharsets.UTF_8);
for (int i = 0; i < bazaKsiazek.size(); i++) {
plik.write(bazaKsiazek.get(i).zwrocId());
plik.write('\n');
plik.write(bazaKsiazek.get(i).zwrocTytul());
plik.write('\n');
// ...
I am using scanner to delimiting tokens by ";". I need when strings are quoted,
that ; would be ignored by scanner in quotes "". Also need delimiting by "".
0 1 2
A ProjectID Name Describtion Summary
B ID-322;"oba stb; iba logo ""T"" ";dg-eiiod
C ID-349;Sttring;dg-enc05
D ID-888;Data;dg-enc05
As you see in string "oba stb; iba logo""T"" ";is my delimiter.
I need to make scanner ignore it, now it would split it as"oba stb"iba logo ""T"" "` which I don't want.
now I have
scanner.useDelimiter(";|\t");
Don't use Scanner for parsing CSV files, use a CSV parser.
Almost all Java-based CSV parsers allow you to use delimiters other that commas.
E.g. with Apache Commons CSV (just to pick a random one):
CSVFormat format = CSVFormat.RFC4180.withDelimiter(';');
Charset charset = Charset.defaultCharset(); // or StandardCharsets.UTF_8
try (CSVParser parser = CSVParser.parse(file, charset, format)) {
for (CSVRecord record : parser) {
String projectID = record.get("ProjectID");
String name = record.get("Name");
String description = record.get("Describtion");
...
}
}
Displaying unicode character in java shows "?" sign. For example, i tried to print "अ". Its unicode Number is U+0905 and html representation is "अ".
The below codes prints "?" instead of unicode character.
char aa = '\u0905';
String myString = aa + " result" ;
System.out.println(myString); // displays "? result"
Is there a way to display unicode character directly from unicode itself without using unicode numbers? i.e "अ" is saved in file now display the file in jsp.
Java defines two types of streams, byte and character.
The main reason why System.out.println() can't show Unicode characters is that System.out.println() is a byte stream that deal with only the low-order eight bits of character which is 16-bits.
In order to deal with Unicode characters(16-bit Unicode character), you have to use character based stream i.e. PrintWriter.
PrintWriter supports the print( ) and println( ) methods. Thus, you can use these methods
in the same way as you used them with System.out.
PrintWriter printWriter = new PrintWriter(System.out,true);
char aa = '\u0905';
printWriter.println("aa = " + aa);
try to use utf8 character set -
Charset utf8 = Charset.forName("UTF-8");
Charset def = Charset.defaultCharset();
String charToPrint = "u0905";
byte[] bytes = charToPrint.getBytes("UTF-8");
String message = new String(bytes , def.name());
PrintStream printStream = new PrintStream(System.out, true, utf8.name());
printStream.println(message); // should print your character
Your myString variable contains the perfectly correct value. The problem must be the output from System.out.println(myString) which has to send some bytes to some output to show the glyphs that you want to see.
System.out is a PrintStream using the "platform default encoding" to convert characters to byte sequences - maybe your platform doesn't support that character. E.g. on my Windows 7 computer in Germany, the default encoding is CP1252, and there's no byte sequence in this encoding that corresponds to your character.
Or maybe the encoding is correct, but simply the font that creates graphical glyphs from characters doesn't have that charater.
If you are sending your output to a Windows CMD.EXE window, then maybe both reasons apply.
But be assured, your string is correct, and if you send it to a destination that can handle it (e.g. a Swing JTextField), it'll show up correctly.
I ran into the same problem wiht Eclipse. I solved my problem by switching the Encoding format for the console from ISO-8859-1 to UTF-8. You can do in the Run/Run Configurations/Common menu.
https://eclipsesource.com/blogs/2013/02/21/pro-tip-unicode-characters-in-the-eclipse-console/
Unicode is a unique code which is used to print any character or symbol.
You can use unicode from --> https://unicode-table.com/en/
Below is an example for printing a symbol in Java.
package Basics;
/**
*
* #author shelc
*/
public class StringUnicode {
public static void main(String[] args) {
String var1 = "Cyntia";
String var2 = new String(" is my daughter!");
System.out.println(var1 + " \u263A" + var2);
//printing heart using unicode
System.out.println("Hello World \u2665");
}
}
******************************************************************
OUTPUT-->
Cyntia ☺ is my daughter!
Hello World ♥
In the Eclipse Ldap Browser plugin, I see an attribute value that has a UTF-8 char (a lowercase n with a ~ above it). This is UTF char c3b1 or USC2 char 00F1 which I've read Java uses for its Strings. But when I print it out to the log file with the following code, it shows up as an uppercase A with a ~ above it, followed by a +/- symbol. All three output statements show the same thing.
while(allAttributes.hasNext()) {
LDAPAttribute attribute = (LDAPAttribute)allAttributes.next();
String attributeValue = new String(attribute.getStringValue());
byte[] attByteValue = attribute.getByteValue();
String utf8Str = new String( attByteValue, "UTF-8");
log.debug("attribute.getStringValue="+attribute.getStringValue());
log.debug("attributeValue="+attributeValue);
log.debug("utf8Str="+utf8Str);
boolean isValidUTF8 = Base64.isValidUTF8(attByteValue, true);
if (isValidUTF8) log.debug("string contains all valid UTF8 chars and UCS2 chars");
else log.debug("string contains invalid UTF8 char(s) or invalid UCS2 char(s)");
}
isValidUTF returns true so it seems there are no invalid chars. Any suggestions how to make it display correctly in the log?
Scanner sc = new Scanner(System.in);
System.out.println("Enter text: ");
String text = sc.nextLine();
try {
String result = new String(text.getBytes("windows-1251"), Charset.forName("UTF-8"));
System.out.println(result);
} catch (UnsupportedEncodingException e) {
System.out.println(e);
}
I'm trying change keyboard: input cyrylic keyboard, output latin. Example: qwerty +> йцукен
It doesn't work, can anyone tell me what i'm doing wrong?
First java text, String/char/Reader/Writer is internally Unicode, so it can combine all scripts.
This is a major difference with for instance C/C++ where there is no such standard.
Now System.in is an InputStream for historical reasons. That needs an indication of encoding used.
Scanner sc = new Scanner(System.in, "Windows-1251");
The above explicitly sets the conversion for System.in to Cyrillic. Without this optional parameter the default encoding is taken. If that was not changed by the software, it would be the platform encoding. So this might have been correct too.
Now text is correct, containing the Cyrillic from System.in as Unicode.
You would get the UTF-8 bytes as:
byte[] bytes = text.getBytes(StandardCharsets.UTF_8);
The old "recoding" of text was wrong; drop this line. in fact not all Windows-1251 bytes are valid UTF-8 multi-byte sequences.
String result = text;
System.out.println(result);
System.out is a PrintStream, a rather rarely used historic class. It prints using the default platform encoding. More or less rely on it, that the default encoding is correct.
System.out.println(result);
For printing to an UTF-8 encoded file:
byte[] bytes = ("\uFEFF" + text).getBytes(StandardCharsets.UTF_8);
Path path = Paths.get("C:/Temp/test.txt");
Files.writeAllBytes(path, bytes);
Here I have added a Unicode BOM character in front, so Windows Notepad may recognize the encoding as UTF-8. In general one should evade using a BOM. It is a zero-width space (=invisible) and plays havoc with all kind of formats: CSV, XML, file concatenation, cut-copy-paste.
The reason why you have gotten the answer to a different question, and nobody answered yours, is because your title doesn't fit the question. You were not attempting to convert between charsets, but rather between keyboard layouts.
Here you shouldn't worry about character layout at all, simply read the line, convert it to an array of characters, go through them and using a predefined map convert these.
The code will be something like this:
Map<char, char> table = new TreeMap<char, char>();
table.put('q', 'й');
table.put('Q', 'Й');
table.put('w', 'ц');
// .... etc
String text = sc.nextLine();
char[] cArr = text.toCharArray();
for(int i=0; i<cArr.length; ++i)
{
if(table.containsKey(cArr[i]))
{
cArr[i] = table.get(cArr[i]);
}
}
text = new String(cArr);
System.out.println(text);
Now, i don't have time to test that code, but you should get the idea of how to do your task.