How to count occurrence of Polish characters in .txt file - java
I have to prepare a .txt file and count how many times each character of alphabet occurs in the file. I've found a very nice piece of code, but unfortunately, it doesn't work with Polish characters like ą,ę,ć,ó,ż,ź. Even though I put them in the array, for some reason they are not found in the .txt file so the output is 0.
Does anyone know why? Maybe I should count them differently, with "Switch" or something similar.
Before anyone asks - yes, the .txt file is saved with UTF-8 :)
public static void main(String[] args) throws FileNotFoundException {
int ch;
BufferedReader reader;
try {
int counter = 0;
for (char a : "AĄĆĘÓBCDEFGHIJKLMNOPQRSTUVWXYZ".toCharArray()) {
reader = new BufferedReader(new FileReader("C:\\Users\\User\\Desktop\\pan.txt"));
char toSearch = a;
counter = 0;
try {
while ((ch = reader.read()) != -1) {
if (a == Character.toUpperCase((char) ch)) {
counter++;
}
}
} catch (IOException e) {
System.out.println("Error");
e.printStackTrace();
}
System.out.println(toSearch + " occurs " + counter);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
Looks like your problem related to encoding and default system charset
try to change reader variable to this
InputStreamReader reader = new InputStreamReader(new FileInputStream("C:\\Users\\User\\Desktop\\pan.txt"), "UTF-8");
try this:
I suggest that you use NIO and this code I have written for you using NIO, RandomAccessFile and MappedByteBuffer that is faster:
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.HashMap;
import java.util.Map;
public class FileReadNio
{
public static void main(String[] args) throws IOException
{
Map<Character, Integer> charCountMap = new HashMap<>();
RandomAccessFile rndFile = new RandomAccessFile
("c:\\test123.txt", "r");
FileChannel inChannel = rndFile.getChannel();
MappedByteBuffer buffer = inChannel.map(FileChannel.MapMode.READ_ONLY, 0, inChannel.size());
buffer.load();
for (int i = 0; i < buffer.limit(); i++)
{
char c = (char) buffer.get();
if (charCountMap.get(c) != null) {
int cnt = charCountMap.get(c);
charCountMap.put(c, ++cnt);
}
else
{
charCountMap.put(c, 1);
}
}
for (Map.Entry<Character,Integer> characterIntegerEntry : charCountMap.entrySet()) {
System.out.printf("char: %s :: count=%d", characterIntegerEntry.getKey(), characterIntegerEntry.getValue());
System.out.println();
}
buffer.clear();
inChannel.close();
rndFile.close();
}
}
Related
Indexing through an array to return any specific value in java
So, I have created code which is reading a CSV file line by line, then splitting each line into their individual values then putting this into an array, but i am stuck on trying the index a value from this array I have created, I will attach the CSV file and also my code, and lets say for example how would I access the value at [3,4], which should be Andorra, and [6,6] which should be 17? CSV FILE: Date,iso3,Continent,CountryName,lat,lon,CumulativePositive,CumulativeDeceased,CumulativeRecovered,CurrentlyPositive,Hospitalized,IntensiveCare,NUTS 31/1/2021,AFG,AS,Afghanistan,33.930445,67.678945,55023,2400,,52623,,,AF 31/1/2021,ALB,EU,Albania,41.156986,20.181222,78127,1380,47424,29323,324,19,AL 31/1/2021,DZA,AF,Algeria,28.026875,1.65284,107122,2888,,104234,,,DZ 31/1/2021,AND,EU,Andorra,42.542268,1.596865,9937,101,,9836,44,,AD 31/1/2021,AGO,AF,Angola,-11.209451,17.880669,19782,464,,19318,,,AO 31/1/2021,AIA,NA,Anguilla,18.225119,-63.07213,17,0,,17,,,AI 31/1/2021,ATG,NA,Antigua and Barbuda,17.363183,-61.789423,218,7,,211,,,AG 31/1/2021,ARG,SA,Argentina,-38.421295,-63.587403,1915362,47775,,1867587,,,AR 31/1/2021,ARM,AS,Armenia,40.066181,45.111108,167026,3080,,163946,,,AM 31/1/2021,ABW,NA,Aruba,12.517713,-69.965112,6858,58,,6800,,,AW 31/1/2021,AUS,OC,Australia,-26.853388,133.275154,28806,909,,27897,,,AU 31/1/2021,AUT,EU,Austria,47.697542,13.349319,411921,7850,383158,21058,1387,297,AT 31/1/2021,AZE,AS,Azerbaijan,40.147396,47.572098,229935,3119,,226816,,,AZ 31/1/2021,BHS,NA,Bahamas,24.885993,-76.709892,8174,176,,7998,,,BS 31/1/2021,BHR,AS,Bahrain,26.039722,50.559306,102626,372,,102254,,,BH 31/1/2021,BGD,AS,Bangladesh,23.68764,90.351002,535139,8127,,527012,,,BD 31/1/2021,BRB,NA,Barbados,13.18355,-59.534649,1498,12,,1486,,,BB 31/1/2021,BLR,EU,Belarus,53.711111,27.973847,248336,1718,,246618,,,BY 31/1/2021,BEL,EU,Belgium,50.499527,4.475402,711417,21118,,690299,1788,315,BE 31/1/2021,BLZ,NA,Belize,17.192929,-88.5009,11877,301,,11576,,,BZ 31/1/2021,BEN,AF,Benin,9.322048,2.313138,3786,48,,3738,,,BJ 31/1/2021,BMU,NA,Bermuda,32.320236,-64.774022,691,12,,679,,,BM 31/1/2021,BTN,AS,Bhutan,27.515709,90.442455,859,1,,858,,,BT 31/1/2021,BWA,AF,Botswana,-22.344029,24.680158,21293,134,,21159,,,BW 31/1/2021,BRA,SA,Brazil,-14.242915,-53.189267,9118513,222666,,8895847,,,BR 31/1/2021,VGB,NA,British Virgin Islands,18.573601,-64.492065,141,1,,140,,,VG CODE: public static String readFile(String file) { FileInputStream fileStream = null; InputStreamReader isr; BufferedReader bufRdr; int lineNum; String line = null; try { fileStream = new FileInputStream(file); isr = new InputStreamReader(fileStream); bufRdr = new BufferedReader(isr); lineNum = 0; line = bufRdr.readLine(); while ((line != null) && lineNum < 27) { lineNum++; System.out.println(line); line = bufRdr.readLine(); } fileStream.close(); } catch (IOException e) { if (fileStream != null) { try { fileStream.close(); } catch (IOException ex2) { } } System.out.println("Error: " + e.getMessage()); } return line; } private static void processLine(String line) { String[] splitLine; splitLine = line.split(","); int lineLength = splitLine.length; for (int i = 0; i < lineLength; i++) { System.out.print(splitLine[i] + " "); } System.out.println(""); }
You need to create a 2D array in readFile. As the file is read, and and each line is split by processLine, insert the array into the 2D array. The method readFile at the end returns the 2D array. Make processLine to return a string array and have it return the result of the split. I marked where I made changes to your code. import java.io.*; public class Main { public static void main(String[] args){ String[][] data = readFile("data.txt"); System.out.println(data[3][4]); System.out.println(data[6][6]); } public static String[][] readFile(String file) { //<<< changed FileInputStream fileStream = null; InputStreamReader isr; BufferedReader bufRdr; int lineNum; String line = null; String[][] data = new String[28][]; //<<< added try { fileStream = new FileInputStream(file); isr = new InputStreamReader(fileStream); bufRdr = new BufferedReader(isr); lineNum = 0; line = bufRdr.readLine(); while (lineNum < 27) { // <<< changed System.out.println(line); line = bufRdr.readLine(); if (line == null) break; // <<< added data[lineNum++] = processLine(line); // <<< added } fileStream.close(); } catch (IOException e) { if (fileStream != null) { try { fileStream.close(); } catch (IOException ex2) { } } System.out.println("Error: " + e.getMessage()); } return data; //added } private static String[] processLine(String line) { //<< changed String[] splitLine; splitLine = line.split(","); int lineLength = splitLine.length; for (int i = 0; i < lineLength; i++) { System.out.print(splitLine[i] + " "); } System.out.println(""); return splitLine; // <<< added } }
You can do it quite simply using the stream API. import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.util.stream.Collectors; import java.util.stream.Stream; public class CsvTest0 { public static void main(String[] args) { Path path = Paths.get("geografy.csv"); try (Stream<String> lines = Files.lines(path)) { String[][] arr = lines.skip(1L) .limit(27L) .map(l -> l.split(",")) .collect(Collectors.toList()) .toArray(new String[][]{}); System.out.println(arr[3][3]); System.out.println(arr[5][6]); } catch (IOException xIo) { xIo.printStackTrace(); } } } However, regarding the code in your question, below is a fixed version followed by notes and explanations. import java.io.BufferedReader; import java.io.IOException; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; public class CsvTest1 { public static String[][] readFile(String file) throws IOException { Path path = Paths.get(file); String[][] arr = new String[27][]; int lineNum; String line = null; try (BufferedReader bufRdr = Files.newBufferedReader(path)) { lineNum = 0; line = bufRdr.readLine(); // Ignore first line of file since it contains headings only. line = bufRdr.readLine(); while ((line != null) && lineNum < 27) { arr[lineNum++] = processLine(line); line = bufRdr.readLine(); } } return arr; } private static String[] processLine(String line) { return line.split(","); } public static void main(String[] args) { try { String[][] arr = readFile("geografy.csv"); System.out.println(arr[3][3]); System.out.println(arr[5][6]); } catch (IOException x) { x.printStackTrace(); } } } Note that the below is not in any particular order. I wrote them as they came to me. No need for FileInputStream and InputStreamReader in order to create BufferedReader. Use Files class instead. Close files in a finally block and not in a catch block. Hence use try-with-resources. I believe better to propagate the exception to the calling method, i.e. method main in this case. I also believe that, unless you can safely ignore the exception, it is always beneficial to print the stack trace. You don't want to process the first line of the file. You appear to have your array indexes mixed up. According to sample data, Andorra is row 3 and column 3 (not column 4). Also, 17 is at [5][6] and not [6][6]. Two-dimensional arrays in java can be declared with only one dimension indicated. Since you only want first 27 lines of file, you know how many rows will be in the 2D array.
Representing Bytes From an mp3 File as Hexadecimal Strings
I am trying to read data from an mp3 file so that they can later be manipulated as hexadecimals. Suppose if I opened an mp3 file in a text editor and I see the characters ÿû²d. The translation should read FF FB B2 64 in hexadecimal (indicating a header). However, the Hex that appears in the output text file is 6E 75 6C 6C and I cannot figure out why. Sources: Java code To convert byte to Hexadecimal convert audio,mp3 file to string and vice versa How to check the charset of string in Java? My code: package mp3ToHex; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.FileWriter; import java.io.IOException; import java.io.UnsupportedEncodingException; import java.math.BigInteger; import java.nio.charset.*; public class mp3ToHex { public static void main(String[] args) { //directories String fileIn = "Some\\Input\\Directory.mp3", fileOut = "Some\\Output\\Directory.txt"; outputData(fileOut, fileIn); } #SuppressWarnings("unused") public static String readFile(String filename) { // variable representing a line of data in the mp3 file String line = ""; try { BufferedReader br = new BufferedReader(new FileReader(new File(filename))); while (br.readLine() != null) { line += br.readLine(); try { if (br == null) { // close reader when all data is read br.close(); } } catch (FileNotFoundException e) { e.getMessage(); } catch (IOException e) { e.printStackTrace(); } } } catch (FileNotFoundException e) { e.getMessage(); } catch (IOException e) { e.printStackTrace(); } return line; } public static void outputData(String outputFile, String inputFile) { try { // Create file FileWriter fileStream = new FileWriter(outputFile); BufferedWriter writer = new BufferedWriter(fileStream); // Convert string to hexadecimal String output = toHex(readFile(inputFile)); StringBuilder s = new StringBuilder(); for (int i = 0; i < output.length(); i++) { // Format for easier reading if (i % 64 == 0) s.append('\n'); else if (i % 2 == 0) s.append(' '); s.append(output.charAt(i)); } // Write to file writer.write(s.toString()); // Close writer writer.close(); } catch (Exception e) { e.printStackTrace(); } } // Converts strings to hexadecimal public static String toHex(String arg) throws UnsupportedEncodingException { return String.format("%02X", new BigInteger(1, arg.getBytes(charset(arg, new String[] { "US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16" })))); } // Converts strings to different encodings public static String convert(String value, String fromEncoding, String toEncoding) throws UnsupportedEncodingException { return new String(value.getBytes(fromEncoding), toEncoding); } // Detects which Charset a string is encoded in by decoding and re-encoding a string. The correct encoding is found if the transformation yields no changes. public static String charset(String value, String charsets[]) throws UnsupportedEncodingException { String probe = StandardCharsets.UTF_8.name(); for (String c: charsets) { Charset charset = Charset.forName(c); if (charset != null) { if (value.equals(convert(convert(value, charset.name(), probe), probe, charset.name()))) { return c; } } } return StandardCharsets.UTF_8.name(); } }
After a bit of experimentation with the program, I discovered that the run configurations for encoding altered the output. The issue was fixed by navigating to Run>Run Configurations>[file name]>Common>Encoding, selecting ISO-8859-1 from the dropdown. Source: https://stackoverflow.com/a/18434549/10589287 Updated Code: import java.io.BufferedWriter; import java.io.FileWriter; import java.io.IOException; import java.nio.charset.*; import java.nio.file.Files; import java.nio.file.Paths; public class mp3ToHex { public static void main(String[] args) throws IOException { //directories String fileIn = "Some\\Input\\Directory\\input.mp3", fileOut = "Some\\Output\\Directory\\out.txt", log = "Some\\Log\\Directory\\log.txt", debug = "Some\\Debug\\Directory\\debug.mp3"; BufferedWriter br = new BufferedWriter(new FileWriter(fileOut)), brL = new BufferedWriter(new FileWriter(log)), brD = new BufferedWriter(new FileWriter(debug)); String s = readFile(fileIn, Charset.forName(StandardCharsets.ISO_8859_1.name())); brD.write(s); byte[] bytes = s.getBytes(); brL.write(bytesToHex(s.getBytes())); StringBuilder binary = new StringBuilder(); for (byte b: bytes) { int val = b; for (int i = 0; i < 8; i++) { binary.append((val & 128) == 0 ? 0 : 1); val <<= 1; } binary.append(' '); } br.write(binary + ""); br.close(); } static String readFile(String path, Charset encoding) throws IOException { byte[] encoded = Files.readAllBytes(Paths.get(path)); return new String(encoded, encoding); } private final static char[] hexArray = "0123456789ABCDEF".toCharArray(); public static String bytesToHex(byte[] bytes) { char[] hexChars = new char[bytes.length * 2]; for (int j = 0; j < bytes.length; j++) { int v = bytes[j] & 0xFF; hexChars[j * 2] = hexArray[v >>> 4]; hexChars[j * 2 + 1] = hexArray[v & 0x0F]; } return new String(hexChars); } }
implement sorting in java for file with records
I asked this question earlier and I forgot to clarify what my question was so hopefully I'm actually clear this time. I basically need help with sorting a bunch of records in a file based on their number using an algorithm like bubble sort. I have a file with 5 records where each file consists of a number of integer type and name of 32 characters(each record size should be 36 bytes). I have to store the records into a file. **This is what I need help with:**Then sort the records based on the numbers associated with them, using a sorting algorithm like bubble sort. Another requirement is that when the program sorts the records, it shouldn't read all records in memory at once but move them in the file. For example, after the program reads the first two records, it may switch the records (because 72 > 56) and write them in the same position in the file. We were provided with the classes to read/write and have random access to the file. These are the records as they were provided: 72 James 56 Mark 87 John 30 Phillip 44 Andrew I need to sort these names according to their respective numbers. My question is, what would be the best way to implement this sorting? Here's the code for the writing class: package test; //write to a file import java.io.*; class FileWriteStreamTest { public static void main (String[] args) { FileWriteStreamTest f = new FileWriteStreamTest(); f.writeMyFile(); } void writeMyFile() { DataOutputStream dos = null; String record = null; int recCount = 0; try { File f = new File("mydata.txt"); if (!f.exists()) f.createNewFile(); FileOutputStream fos = new FileOutputStream(f); BufferedOutputStream bos = new BufferedOutputStream(fos); dos = new DataOutputStream(bos); //store records into file dos.writeBytes(72 + " James \n"); dos.writeBytes(56 + " Mark \n"); dos.writeBytes(87 + " John \n"); dos.writeBytes(30 + " Phillip \n"); dos.writeBytes(44 + " Andrew \n"); } catch (IOException e) { System.out.println("Uh oh, got an IOException error!" + e.getMessage()); } finally { // if the file opened okay, make sure we close it if (dos != null) { try { dos.close(); } catch (IOException ioe) { } } } } } Here's the code for the reading class: package test; //read from a file import java.io.*; public class FileReadStreamTest { public static void main(String[] args) { // TODO Auto-generated method stub FileReadStreamTest f = new FileReadStreamTest(); f.readMyFile(); } void readMyFile() { DataInputStream dis = null; String record = null; int recCount = 0; try { File f = new File("mydata.txt"); if (!f.exists()) { System.out.println(f.getName() + " does not exist"); return; } FileInputStream fis = new FileInputStream(f); BufferedInputStream bis = new BufferedInputStream(fis); dis = new DataInputStream(bis); while ( (record=dis.readLine()) != null ) { recCount++; System.out.println(recCount + ": " + record); } } catch (IOException e) { System.out.println("Uh oh, got an IOException error!" + e.getMessage()); } finally { // if the file opened okay, make sure we close it if (dis != null) { try { dis.close(); } catch (IOException ioe) { } } } } } Here's the code for the random access class: package test; //read or write to any place in the file import java.io.*; class FileRandomAccessTest { public static void main (String[] args) { FileRandomAccessTest f = new FileRandomAccessTest(); f.readWriteMyFile(); } void readWriteMyFile() { RandomAccessFile raf = null; String s = null; try { File f = new File("mydata.txt"); if (!f.exists()) // check if the file exists f.createNewFile(); // create a new file raf = new RandomAccessFile(f, "rw"); // open a file for random access with "r", "rw" if (raf.length() > 7) {// the size of the file raf.seek(7); // move the file pointer System.out.println(raf.readLine()); // read a line from the file pointer s = raf.readLine(); System.out.println(s); raf.seek(raf.getFilePointer() - s.length()); // get the file pointer raf.writeBytes("Test RamdomAccessFile\n"); // write bytes } } catch (IOException e) { System.out.println("Uh oh, got an IOException error!" + e.getMessage()); } finally { // if the file opened okay, make sure we close it if (raf != null) { try { raf.close(); } // close the file catch (IOException ioe) { } } } } } My current bubble sort implementation that needs to be adapted for this problem: package test; import java.io.File; import java.io.FileNotFoundException; import java.util.Arrays; import java.util.Scanner; public class Sort { public static void bubbleSort(int[] num ) { int j; boolean flag = true; // set flag to true to begin first pass int temp; //holding variable while ( flag ) { flag= false; //set flag to false awaiting a possible swap for( j=0; j < num.length -1; j++ ) { if ( num[ j ] < num[j+1] ) { temp = num[ j ]; //swap elements num[ j ] = num[ j+1 ]; num[ j+1 ] = temp; flag = true; //shows a swap occurred } } } } public static void main(String[] args) throws FileNotFoundException { Scanner scanner = new Scanner(new File("numbers.txt")); int [] numbers = new int [5]; int i = 0; while(scanner.hasNextInt()){ numbers[i++] = scanner.nextInt(); } bubbleSort(numbers); System.out.println(Arrays.toString(numbers)); } }
Reading File into a char[] array or arraylist
I need to read text from the user and create an array which contains characters so that I can run them through a FSM. However, I can't seem to get the buffered reader to agree with a non-string type of input. Any advice? I also don't know if I should be using an array or arraylist static ArrayList<Character> StringList = new ArrayList<Character>(); static char[] data; public static void main(String[] args){ InputStreamReader ISR = new InputStreamReader (System.in); BufferedReader BR = new BufferedReader(ISR); try{ String sCurrentChar; while((sCurrentChar=BR.readLine())!=null){ for(int i= 0; i<sCurrentChar.length(); i++) StringList.add(sCurrentChar.charAt(i)); } for(int i =0; i<StringList.size(); i++){ System.out.println(StringList.get(i)); } } catch(IOException e){ e.printStackTrace(); } }
Maybe something like the following could work for you, if you want to read raw byte data, maybe using them later as characters. This might be a better approach than reading input line at a time. import java.io.DataInputStream; import java.io.EOFException; import java.io.IOException; import java.util.Arrays; public class a { public static void main(String[] args){ DataInputStream d = new DataInputStream(System.in); int[] bytes = new int[256]; try { int b; int l = 0; while((b = d.readByte()) > 0) { bytes[l++] = b; if((l % 256) == 0) bytes = Arrays.copyOf(bytes, (l + 256)); } } catch(EOFException e) { // end-of-file } catch(IOException e) { System.err.println("AIEEEE: " + e); System.exit(-1); } for(int i = 0; bytes[i] > 0; i++) System.out.print((char)bytes[i]); System.exit(0); } } The way arrays are treated here is probably a fine example how one should not do it, but then again, this is more about reading bytes/unsigned characters of data than efficiently processing arrays.
Reading a file in Java
I have the following code to open and read a file. I'm having trouble figuring out how I can have it go through and print the total number of each character in the file, print the first and last character, and print the character exactly in the middle of the file. What's the most efficient way to do this? This is the main class: import java.io.IOException; public class fileData { public static void main(String[ ] args) throws IOException { String file_name = "/Users/JDB/NetBeansProjects/Program/src/1200.dna"; try { ReadFile file = new ReadFile(file_name); String[] arrayLines = file.OpenFile(); int i; for (i=0; i<arrayLines.length; i++) { System.out.println(arrayLines[i]); } } catch (IOException e) { System.out.println(e.getMessage()) ; } } } and the other class: import java.io.IOException; import java.io.FileReader; import java.io.BufferedReader; public class ReadFile { private String path; public ReadFile (String file_path) { path = file_path; } public String[] OpenFile() throws IOException { FileReader fr = new FileReader(path); BufferedReader textReader = new BufferedReader(fr); int numberOfLines = readLines(); String[] textData = new String[numberOfLines]; int i; for(i=0; i<numberOfLines; i++) { textData[i] = textReader.readLine(); } textReader.close(); return textData; } int readLines() throws IOException { FileReader file_to_read = new FileReader(path); BufferedReader bf = new BufferedReader(file_to_read); String aLine; int numberOfLines = 0; while (( aLine = bf.readLine() ) != null) { numberOfLines++; } bf.close(); return numberOfLines; }
Some hints which might help. A Map can be used to store information about each character in the alphabet. The middle of the file can be found from the size of the file.
These few lines of code will do it (using Apache's FileUtils library): import org.apache.commons.io.FileUtils; public static void main(String[] args) throws IOException { String str = FileUtils.readFileToString(new File("myfile.txt")); System.out.println("First: " + str.charAt(0)); System.out.println("Last: " + str.charAt(str.length() - 1)); System.out.println("Middle: " + str.charAt(str.length() / 2)); } Anyone who says "you can't use libraries for homework" isn't being fair - in the real world we always use libraries in preference to reinventing the wheel.
The easiest way to understand I can think of is to read the entire file in as a String. Then use the methods on the String class to get the first, last, and middle character (character at index str.length()/2). Since you are already reading in the file a line at a time, you can use a StringBuilder to construct a string out of those lines. Using the resulting String, the charAt() and substring() methods you should be able to get out everything you want.