How to use Ä, Ö & Å in java input?

How to use Ä, Ö & Å in java input? - java

So. I'm wondering how could i make it so that
System prints word which contains a-z + å, ä and ö letters.
(At the moment å, ä and ö are printed in a weird way. I'm pretty sure that you know what it looks like :D)
User inputs a word and compares it to the first word. And at the moment if the word above ^ contains ä, ö or å and i input that word.. It won't see the match between those 2..
So the question is: How can I make it so that if you put å, ä or ö to input it will notice that it's exactly the same å, ä, ö in the word it just printed? I'm using
answer.equals(rightanswer)
There's my whole code :D Mostly just quests and answers :)
import java.io.*;
import java.awt.*;
public class sanaopisto {
public static int quanity;
public static String rightanswer;
public static String question;
public static int right;
public static int wrong;
public static double ratio;
public static void main(String[] args) {
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
try{
System.out.print("Moneenko sanaan tahdot vastata? ");
quanity = Integer.parseInt(in.readLine());
for(int x=0; x<quanity; x++){
System.out.println(x+1 +". kysymys");
getquestion(quanity);
}
System.out.println("Oikeita vastauksia " +right +" ja v\u201e\u201eri\u201e " +wrong +".");
}catch(Exception e) {
System.out.println("Tapahtui virhe.");}}
public static void getquestion(int quanity) {
try{
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
int[] done = new int[100];//create array but everything is null
for(int i = 0; i<done.length; i++)
{
done[i] = 0;//need default values else wise it'll just be NULL!!!
}
//must be done before the do-while loop starts
boolean allDone = false;
String answer;
int ran;
if (!areAllQuestionsComplete(done)){ //Changed (!areAllQuestionsComplete(done)) thingy like this..
do{ //And made this work properly etc.
ran = (int)(Math.random() * 53 + 1);
} while (done[ran] == 1);
if(done[ran] != 1)
{
//ask random question
//if answer is correct, set done[ran] = 1
//else just let do-while loop run
if (ran == 1) { //1
question = "ruotsalainen";
rightanswer = "svensk, -t, -a";}
if (ran == 2) { //2
question = "suomalainen";
rightanswer = "finländsk, -t, -a";}
//.
//. Took some code away from here.. Because too many questions.. In real version I have all the 1-84 questions :D
//.
if (ran == 83) { //15
question = "globalisoitunut";
rightanswer = "globaliserad, -at, -ade";}
if (ran == 84) { //15
question = "maailma";
rightanswer = "en värld, -en, -ar, -arna";}
}
System.out.println(question);
System.out.print("Vastaus?: ");
answer = in.readLine();
if (answer.equals(rightanswer)){
right++;
System.out.println("Oikein!\n");
done[ran] = 1;}
else{wrong++;
System.out.println("Oikea vastaus on: " +rightanswer +"\n");}
//check if all questions are answered}
else {
System.out.println("You have answered every question!"); //I know that this is useless.. :D
}
}catch(Exception e) {
System.out.println("You made a mistake.");}
}
private static boolean areAllQuestionsComplete(int[] list)
{
for(int i = 0; i<list.length; i++)
{
if(list[i] != 1)
{
return false;//found one false, then all false
}
}
return true;//if it makes it here, then you know its all done
}
}
Edit Added whole code 'took some of the questions away' And I'm using CMD

I'm guessing you are using System.out and System.in which use the systems default encoding.
This is some kind of DOS encoding in the windows command line, depending on your computer settings.
So to allow any kind of Unicode character like äöü and the like to be read and printed as you want to, you have to change your command line encoding (E.g. tell DOS to use a different encoding) and java to use the same encoding.
To correctly answer on how this can be done, one would need more information about your operating system.
on the java side you can use a InputStreamReader and give the character set (encoding) to it's constructor to read and a PrintStream (giving the same encoding as well) to write.

UTF-8 should enable you to use them.
You're Finnish right? :)
If you try that:
BufferedReader in = new BufferedReader(new InputStreamReader(System.in, "UTF-8"));
How is working for you?
EDIT:
Somehow the UTF-8 which should work doesn't seem to do the trick. I tried using -Dfile.encoding=UTF8 as a JVM property but didn't work for me
So I tried basically all the charsets which were available and few of them gave the correct characters, here are the charset names:
x-ISO-2022-CN-GB, x-ISO-2022-CN-CNS, x-IBM922, windows-1258, windows-1254, windows-1252,
ISO-8859-9, ISO-8859-4, ISO-8859-1, ISO-2022-KR and ISO-2022-CN
So if you try for example:
BufferedReader in = new BufferedReader(new InputStreamReader(System.in, "x-ISO-2022-CN-GB"));
It should work

Related

Is there any way to let the program recognize "\n" in text files as line break code?

I've been creating a game in Java for a while and I used to write all the in-game texts directly in my code like this:
String text001 = "You're in the castle.\n\nWhere do you go next?"
But recently I decided to write all the in-game texts in a text file and tried to let the program read them and put them into a String array since the amount of the texts has increased a lot and it made my code incredibly long. The reading went well except one thing. I've inserted line break codes in dialogues and although the code worked properly when I wrote it directly in my code, they are no longer recognized as line break code when I try to read them from a text file.
It is supposed to be displayed as:
You're in the castle.
Where do you go next?
But now it is displayed as:
You're in the castle.\n\nWhere do you go next?
The code doesn't recognize "\n" as line break code any more.
Here's the code :
import java.io.File;
import java.util.Scanner;
import java.util.StringTokenizer;
public class Main {
public static void main(String[] args) {
new Main();
}
public Main() {
Scanner sc;
StringTokenizer token;
String line;
int lineNumber = 1;
String id[] = new String[100];
String text[] = new String[100];
try {
sc = new Scanner(new File("sample.txt"));
while ((line = sc.nextLine()) != null) {
token = new StringTokenizer(line, "|");
while (token.hasMoreTokens()) {
id[lineNumber] = token.nextToken();
text[lineNumber] = token.nextToken();
lineNumber++;
}
}
} catch (Exception e) {
}
System.out.println(text[1]);
String text001 = "You're in the castle.\n\nWhere do you go next?";
System.out.println(text001);
}
}
And this is the content of the text file:
castle|You're in the castle.\n\nWhere do you go next?
inn|You're in the inn. \n\nWhere do you go next?
I would be grateful if anyone tells me how to fix this. Thank you.

Just use
text[lineNumber] = token.nextToken().replace("\\n", "\n");
There is nothing inherently special about \n in a text file. It is just a \, followed by a \n.
It is only in Java (or other languages) which define that this sequence of characters - in a char or string literal - should be interpreted as a 0x0a (ASCII newline) character.
So, you can replace the character sequence with the one you want it to be interpreted as.

Split paragraphs into sentences - a special case

I am a newbie to programming in Java. I want to split the paragraphs in one file into sentences and write them in a different file. Also there should be mechanism to identify which sentence comes from which paragraph.The code I have used so far is mentioned below. But this code breaks:
Former Secretary of Finance Dr. P.B. Jayasundera is being questioned by the police Financial Crime Investigation Division.
into
Former Secretary of Finance Dr.
P.B.
Jayasundera is being questioned by the police Financial Crime Investigation Division.
How can I correct it? Thanks in advance.
import java.io.*;
class trial4{
public static void main(String args[]) throws IOException
{
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
String s;
OutputStream out = new FileOutputStream("output10.txt");
String token[];
while((s = br.readLine()) != null)
{
token = s.split("(?<=[.!?])\\s* ");
for(int i=0;i<token.length;i++)
{
byte buf[]=token[i].getBytes();
for(int j=0;j<buf.length;j=j+1)
{
out.write(buf[j]);
if(j==buf.length-1)
out.write('\n');
}
}
}
fr.close();
}
}
I referenced all the similar questions posted on StackOverFlow. But those answers couldn't help me solve this.

How about using a negative-lookbehind in conjunction with a replace. Simply said: Replace all line endings that don't have "something special" before them with the line end followed by newline.
A list of "known abbreviations" will be needed. There's no guarantee as to how long those can be or how short a word at the end of a line might be. (See? 'be' if quite short already!)
class trial4{
public static void main(String args[]) throws IOException {
FileReader fr = new FileReader("input.txt");
BufferedReader br = new BufferedReader(fr);
PrintStream out = new PrintStream(new FileOutputStream("output10.txt"));
String s = br.readLine();
while(s != null) {
out.print( //Prints newline after each line in any case
s.replaceAll("(?i)" //Make the match case insensitive
+ "(?<!" //Negative lookbehind
+ "(\\W\\w)|" //Single non-word followed by word character (P.B.)
+ "(\\W\\d{1,2})|" //one or two digits (dates!)
+ "(\\W(dr|mr|mrs|ms))" //List of known abbreviations
+ ")" //End of lookbehind
+"([!?\\.])" //Match end-ofsentence
, "$5" //Replace with end-of-sentence found
+System.lineSeparator())); //Add newline if found
s = br.readLine();
}
}
}

As mentioned in the comment "it will be reasonable hard" to break text into paragraphs without formalizing the requirements. Take a look at BreakIterator - especially SentenceInstance. You might roll out your own BreakIterator since it breaks the same as you get with regexp, except that it is more abstract. Or try to find a 3rd party solution like http://deeplearning4j.org/sentenceiterator.html which can be trained to tokenize your input.
Example with BreakIterator:
String str = "Former Secretary of Finance Dr. P.B. Jayasundera is being questioned by the police Financial Crime Investigation Division.";
BreakIterator bilus = BreakIterator.getSentenceInstance(Locale.US);
bilus.setText(str);
int last = bilus.first();
int count = 0;
while (BreakIterator.DONE != last)
{
int first = last;
last = bilus.next();
if (BreakIterator.DONE != last)
{
String sentence = str.substring(first, last);
System.out.println("Sentence:" + sentence);
count++;
}
}
System.out.println("" + count + " sentences found.");

How to take first word of new paragraph into consideration?

I'm trying to build a program that takes in files and outputs the number of words in the file. It works perfectly when everything is under one whole paragraph. However, when there are multiple paragraphs, it doesn't take into account the first word of the new paragraph. For example, if a file reads "My name is John" , the program will output "4 words". However, if a file read"My Name Is John" with each word being a new paragraph, the program will output "1 word". I know it must be something about my if statement, but I assumed that there are spaces before the new paragraph that would take the first word in a new paragraph into account.
Here is my code in general:
import java.io.*;
public class HelloWorld
{
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("health.txt");
// Use DataInputStream to read binary NOT text.
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int word2 =0;
int word3 =0;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
;
int wordLength = strLine.length();
System.out.println(strLine);
for(int i = 0 ; i < wordLength -1 ; i++)
{
Character a = strLine.charAt(i);
Character b= strLine.charAt(i + 1);
**if(a == ' ' && b != '.' &&b != '?' && b != '!' && b != ' ' )**
{
word2++;
//doesnt take into account 1st character of new paragraph
}
}
word3 = word2 + 1;
}
System.out.println("There are " + word3 + " "
+ "words in your file.");
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
I've tried adjusting the if statement multiple teams, but it does not seem to make a difference. Does anyone know where I'm messing up?
I'm a pretty new user and asked a similar question a couple days back with people accusing me of demanding too much of users, so hopefully this narrows my question a bit. I just am really confused on why its not taking into account the first word of a new paragraph. Please let me know if you need any more information. Thanks!!

Firstly, your counting logic is incorrect. Consider:
word3 = word2 + 1;
Think about what this does. Every time through your loop, when you read a line, you essentially count the words in that line, then reset the total count to word2 + 1. Hint: If you want to count the total number in the file, you'd want to increment word3 each time, rather than replace it with the current line's word count.
Secondly, your word parsing logic is slightly off. Consider the case of a blank line. You would see no words in it, but you treat the word count in the line as word2 + 1, which means you are incorrectly counting a blank line as 1 word. Hint: If the very first character on the line is a letter, then the line starts with a word.
Your approach is reasonable although your implementation is slightly flawed. As an alternate option, you may want to consider String.split() on each line. The number of elements in the resulting array is the number of words on the line.
By the way, you can increase readability of your code, and make debugging easier, if you use meaningful names for your variables (e.g. totalWords instead of word3).

if your paragraph is not started by whitespace, then your if condition won't count the first word.
"My name is John" , the program will output "4 words", this is not correct, because you miss the first word but add one after.
Try this:
String strLine;
strLine = strLine.trime();//remove leading and trailing whitespace
String[] words = strLine.split(" ");
int numOfWords = words.length;

I personally prefer a regular Scanner with token-based scanning for this sort of thing. How about something like this:
int words = 0;
Scanner lineScan = new Scanner(new File("fileName.txt"));
while (lineScan.hasNext()) {
Scanner tokenScan = new Scanner(lineScan.Next());
while (tokenScan.hasNext()) {
tokenScan.Next();
words++;
}
}
This iterates through every line in the file. And for every line in the file, it iterates through every token (in this case words) and increments the word count.

I am not sure what you mean by "paragraph", however I tried to use capital letters as you suggested and it worked perfectly fine. I used Appache Commons IO library
package Project1;
import java.io.*;
import org.apache.commons.io.*;
public class HelloWorld
{
private static String fileStr = "";
private static String[] tokens;
public static void main(String[]args)
{
try{
// Open the file that is the first
// command line parameter
try {
File f = new File("c:\\TestFile\\test.txt");
fileStr = FileUtils.readFileToString(f);
tokens = fileStr.split(" ");
System.out.println("Words in file : " + tokens.length);
}
catch(Exception ex){
System.out.println(ex);
}
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}

Remove integers at the start of multiple line

I've a txt file having over thousand line of text that has some integers at the starting.
Like:
22Ahmedabad, AES Institute of Computer Studies
526Ahmedabad, Indian Institute of Managment
561Ahmedabad, Indus Institute of Technology & Engineering
745Ahmedabad, Lalbhai Dalpatbhai College of Engineering
I want to store all the lines in another file without the integers.
The code I've written is:
while (s.hasNextLine()){
String sentence=s.nextLine();
int l=sentence.length();
c++;
try{//printing P
FileOutputStream ffs = new FileOutputStream ("ps.txt",true);
PrintStream p = new PrintStream ( ffs );
for (int i=0;i<l;i++){
if ((int)sentence.charAt(i)<=48 && (int)sentence.charAt(i)>=57){
p.print(sentence.charAt(i));
}
}
p.close();
}
catch(Exception e){}
}
But it outputs a blank file.

There are a couple of things in your code that should be improved:
Don't re-open the output file with every line. Just keep it open the whole time.
You are removing all numbers, not just numbers at the beginning - is that your intention?
Do you know any number that is both <= 48 and >= 57 at the same time?
Scanner.nextLine() does not include line returns, so you'll need a call to p.println() after every line.
Try this:
// open the file once
FileOutputStream ffs = new FileOutputStream ("ps.txt");
PrintStream p = new PrintStream ( ffs );
while (s.hasNextLine()){
String sentence=s.nextLine();
int l=sentence.length();
c++;
try{//printing P
for (int i=0;i<l;i++){
// check "< 48 || > 57", which is non-numeric range
if ((int)sentence.charAt(i)<48 || (int)sentence.charAt(i)>57){
p.print(sentence.charAt(i));
}
}
// move to next line in output file
p.println();
}
catch(Exception e){}
}
p.close();

You can apply this regular expression to each line that you read from the file:
String str = ... // read the next line from the file
str = str.replaceAll("^[0-9]+", "");
The regular expression ^[0-9]+ matches any number of digits at the beginning of the line. replaceAll method replaces the match with an empty string.

On top of mellamokb comments, you should avoid "magic numbers". There's no guarantee that that the digits will fall within the expected range of ASCII codes.
You can simply detect if a character is a digit using Character.isDigit
String value = "22Ahmedabad, AES Institute of Computer Studies";
int index = 0;
while (Character.isDigit(value.charAt(index))) {
index++;
}
if (index < value.length()) {
System.out.println(value.substring(index));
} else {
System.out.println("Nothing but numbers here");
}
(Nb dasblinkenlight has posted some excellent regular expression, which would probably easier to use, but if you're like, regexp turns my brain inside out :P)

Preserving line breaks and spacing in file IO

I am workig on a pretty neat problem challenge that involves reading words from a .txt file. The program must allow for ANY .txt file to be read, ergo the program cannot predict what words it will be dealing with.
Then, it takes the words and makes them their "Pig Latin" counterpart, and writes them into a new file. There are a lot more requirements to this problem but siffice to say, I have every part solved save one...when printng to the new file I am unable to perserve the line spacing. That is to say, if line 1 has 5 words and then there is a break and line 2 has 3 words and a break...the same must be true for the new file. As it stands now, it all works but all the converted words are all listed one after the other.
I am interested in learning this so I am OK if you all wish to play coy in your answers. Although I have been at this for 9 hours so "semi-coy" will be appreaciated as well :) Please pay close attention to the "while" statements in the code that is where the file IO action is happening. I am wondering if I need to utilize the nextLine() commands from the scanner and then make a string off that...then make substrings off the nextLine() string to convert the words one at a time. The substrings could be splits or tokens, or something else - I am unclear on this part and token attempts are giving me compiler arrors exceptions "java.util.NoSuchElementException" - I do not seem to understand the correct call for a split command. I tried something like String a = scan.nextLine() where "scan" is my scanner var. Then tried String b = a.split() no go. Anyway here is my code and see if you can figure out what I am missing.
Here is code and thank you very much in advance Java gods....
import java.util.*;
import javax.swing.*;
import java.io.*;
import java.text.*;
public class PigLatinTranslator
{
static final String ay = "ay"; // "ay" is added to the end of every word in pig latin
public static void main(String [] args) throws IOException
{
File nonPiggedFile = new File(...);
String nonPiggedFileName = nonPiggedFile.getName();
Scanner scan = new Scanner(nonPiggedFile);
nonPiggedFileName = ...;
File pigLatinFile = new File(nonPiggedFileName + "-pigLatin.txt"); //references a file that may or may not exist yet
pigLatinFile.createNewFile();
FileWriter newPigLatinFile = new FileWriter(nonPiggedFileName + "-pigLatin.txt", true);
PrintWriter PrintToPLF = new PrintWriter(newPigLatinFile);
while (scan.hasNext())
{
boolean next;
while (next = scan.hasNext())
{
String nonPig = scan.next();
nonPig = nonPig.toLowerCase();
StringBuilder PigLatWord = new StringBuilder(nonPig);
PigLatWord.insert(nonPig.length(), nonPig.charAt(0) );
PigLatWord.insert(nonPig.length() + 1, ay);
PigLatWord.deleteCharAt(0);
String plw = PigLatWord.toString();
if (plw.contains("!") )
{
plw = plw.replace("!", "") + "!";
}
if (plw.contains(".") )
{
plw = plw.replace(".", "") + ".";
}
if (plw.contains("?") )
{
plw = plw.replace("?", "") + "?";
}
PrintToPLF.print(plw + " ");
}
PrintToPLF.close();
}
}
}

Use BufferedReader, not Scanner. http://docs.oracle.com/javase/6/docs/api/java/io/BufferedReader.html
I leave that part of it as an exercise for the original poster, it's easy once you know the right class to use! (And hopefully you learn something instead of copy-pasting my code).
Then pass the entire line into functions like this: (note this does not correctly handle quotes as it puts all non-apostrophe punctuation at the end of the word). Also it assumes that punctuation is supposed to go at the end of the word.
private static final String vowels = "AEIOUaeiou";
private static final String punct = ".,!?";
public static String pigifyLine(String oneLine) {
StringBuilder pigified = new StringBuilder();
boolean first = true;
for (String word : oneLine.split(" ")) {
if (!first) pigified.append(" ");
pigified.append(pigify(word));
first = false;
}
return pigified.toString();
}
public static String pigify(String oneWord) {
char[] chars = oneWord.toCharArray();
StringBuilder consonants = new StringBuilder();
StringBuilder newWord = new StringBuilder();
StringBuilder punctuation = new StringBuilder();
boolean consDone = false; // set to true when the first consonant group is done
for (int i = 0; i < chars.length; i++) {
// consonant
if (vowels.indexOf(chars[i]) == -1) {
// punctuation
if (punct.indexOf(chars[i]) > -1) {
punctuation.append(chars[i]);
consDone = true;
} else {
if (!consDone) { // we haven't found the consonants
consonants.append(chars[i]);
} else {
newWord.append(chars[i]);
}
}
} else {
consDone = true;
// vowel
newWord.append(chars[i]);
}
}
if (consonants.length() == 0) {
// vowel words are "about" -> "aboutway"
consonants.append("w");
}
consonants.append("ay");
return newWord.append(consonants).append(punctuation).toString();
}

You could try to store the count of words per line in a separate data structure, and use that as a guide for when to move on to the next line when writing the file.
I purposely made this semi-vague for you, but can elaborate on request.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to use Ä, Ö & Å in java input? - java

Related

Is there any way to let the program recognize "\n" in text files as line break code?

Split paragraphs into sentences - a special case

How to take first word of new paragraph into consideration?

Remove integers at the start of multiple line

Preserving line breaks and spacing in file IO

Categories

Resources