I have here a program that enters a paragraph and writes it into a file. After that, it should count the occurrences of each letters (case sensitive). However, it doesn't count the number of letter occurrences. I think I put the for loop in the wrong place.
import java.io.*;
import java.util.*;
public class Exercise1 {
public static int countLetters (String line, char alphabet) {
int count = 0;
for (int i = 0; i <= line.length()-1; i++) {
if (line.charAt(i) == alphabet)
count++;
}
return count;
}
public static void main(String[] args) throws IOException {
BufferedReader buffer = new BufferedReader (new InputStreamReader(System.in));
PrintWriter outputStream = null;
Scanner input = new Scanner (System.in);
int total;
try {
outputStream = new PrintWriter (new FileOutputStream ("par.txt"));
System.out.println("How many lines are there in the paragraph you'll enter?");
int lines = input.nextInt();
System.out.println("Enter the paragraph: ");
String paragraph = buffer.readLine();
outputStream.println(paragraph);
int j;
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
outputStream.close();
System.out.println("The paragraph is written to par.txt");
for (int k=1; k<lines; k++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
if (total != 0)
System.out.println("A: "+total);
//I'll do bruteforce here up to lowercase z
}
}
catch(FileNotFoundException e) {
System.out.println("Error opening the file par.txt");
}
}
}
Please help me fix the code. I'm new in programming and I need help. Thank you very much!
First, your initial reading user input is a bit of a waste since you read once then enter the for loop for the rest - this is not a problem, just a better code.
// your code
String paragraph = buffer.readLine();
outputStream.println(paragraph);
int j;
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
You can just put them in the loop:
// better code
String paragraph;
int j;
for (j = 0; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
Then your first problem comes from the way you read the lines:
// your code - not working
outputStream.close();
for (int k=1; k<lines; k++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
Consider what happened above:
The input is already DONE, the output is already written and stream is closed - up to here everything is good
Then when you try to count the number of characters, you do: paragraph = buffer.readLine(); - what does this code do? It waits for another user input (instead of reading what's been inserted)
To fix the problem above: you need to read from what's already been written - not asking for another input. Then instead of brute forcing every character one by one, you can just put them into a list and write a for loop.
So now, you want to read from the existing file that you already created (ie. reading what WAS inputted by the user):
BufferedReader fileReader = new BufferedReader(new FileReader(new File("par.txt")));
String allCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
String aLineInFile;
// Read the file that was written earlier (whose content comes from user input)
// This while loop will go through line-by-line in the file
while((aLineInFile = fileReader.readLine()) != null)
{
// For every line in the file, count number of occurrences of characters
// This loop goes through every character (a-z and A-Z)
for(int i = 0; i < allCharacters.length(); i++)
{
// For each single character, check the number of occurrences in the current line
String charToLookAt = String.valueOf(allCharacters.charAt(i));
int numOfCharOccurancesInLine = countLetters (aLineInFile, charToLookAt);
System.out.println("For line: " + aLineInFile + ", Character: " + charToLookAt + " appears: " + numOfCharOccurancesInLine + " times " );
}
}
The above gives you the number of occurrences of every character in every line - now you just need to organize them to keep track of how many are in total for the whole file.
Code-wise, there might be better way to write this to have cleaner implementation, but the above is easy to understand (and I just wrote it very quickly).
Do everything in one loop:
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
if (total != 0)
System.out.println("A: "+total);
outputStream.println(paragraph);
}
You can use a HashTable for count each case sentitive letters :
final Pattern patt = Pattern.compile("A-Za-z]");
final HashMap<Character, Integer> tabChar = new HashMap<Character, Integer>(
52);
// replace : paragraph = buffer.readLine();
// Unless you use it outside, you can declare it 'final'
final char[] paragraph = "azera :;,\nApOUIQSaOOOF".toCharArray();
for (final Character c : paragraph ) {
if (Character.isLetter(c)) {
Integer tot = tabChar.get(c);
tabChar.put(c, (null == tot) ? 1 : ++tot);
}
}
Output :
{F=1, A=1, O=4, I=1, U=1, Q=1, S=1, e=1, a=3, r=1, p=1, z=1}
You can use final TreeSet<Character> ts = new TreeSet(tabChar.keySet()); to sort the characters and then get(c); them from tabChar
The previous answers would have solved your problem but another way of avoiding brute force might be to use a loop using ASCII character value.
Related
I'm reading the contents of a text file char by char, then I've sorted them in ascending order and count the number of times each char occurs. When I run the program my numbers are way off, for example there are 7 'A' in the file, but I get 17. I'm thinking this means either something is wrong with my counting, or the way I'm reading the chars. Any ideas on what is wrong?
public class CharacterCounts {
public static void main(String[] args) throws IOException{
String fileName = args[0];
BufferedReader in = new BufferedReader(new FileReader(new File(fileName)));
ArrayList<Character> vals = new ArrayList<Character>();
ArrayList<Integer> valCounts = new ArrayList<Integer>();
while(in.read() != -1){
vals.add((char)in.read());
}
Collections.sort(vals);
//This counts how many times each char occures,
//resets count to 0 upon finding a new char.
int count = 0;
for(int i = 1; i < vals.size(); i++){
if(vals.get(i - 1) == vals.get(i)){
count++;
} else {
valCounts.add(count + 1);
count = 0;
}
}
//Removes duplicates from vals by moving from set then back to ArrayList
Set<Character> hs = new HashSet<Character>();
hs.addAll(vals);
vals.clear();
vals.addAll(hs);
//System.out.print(vals.size() + "," + valCounts.size());
for(int i = 0; i < vals.size(); i++){
//System.out.println(vals.get(i));
System.out.printf("'%c' %d\n", vals.get(i), valCounts.get(i));
}
}
}
When you write
if(vals.get(i - 1) == vals.get(i)){
Both are completely different references and they are not at all equals. You have to compare their value.
You want
if(vals.get(i - 1).equals(vals.get(i))){
I think you are overcomplicating your count logic. In addition you call read() twice in the loop so you are skipping every other value.
int[] counts = new int[256]; // for each byte value
int i;
while ((i = in.read()) != -1) { // Note you should only be calling read once for each value
counts[i]++;
}
System.out.println(counts['a']);
Why not use regex instead, the code will be more flexible and simple. Have a look at the code below:
...
final BufferedReader reader = new BufferedReader(new FileReader(filename));
final StringBuilder contents = new StringBuilder();
//read content in a string builder
while(reader.ready()) {
contents.append(reader.readLine());
}
reader.close();
Map<Character,Integer> report = new TreeMap<>();
//init a counter
int count = 0;
//Iterate the chars from 'a' to 'z'
for(char a = 'a';a <'z'; a++ ){
String c = Character.toString(a);
//skip not printable char
if(c.matches("\\W"))
continue;
String C = c.toUpperCase();
//match uppercase and lowercase char
Pattern pattern = Pattern.compile("[" + c + C +"]", Pattern.MULTILINE);
Matcher m = pattern.matcher(contents.toString());
while(m.find()){
count++;
}
if(count>0){
report.put(a, count);
}
//reset the counter
count=0;
}
System.out.println(report);
...
I am trying to write a code which would count the number of words of a certain length in a file.
For example:
How are you?
would print:
Proportion of 3-letter words: 100% (3 words)
I want to count words of length 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13+
Can you please guide me?
I am NOT trying to find the number of words. I am already able to do with this code:
public static int WordCount() throws FileNotFoundException
{
File file = new File("sample.txt");
Scanner keyboard = new Scanner(new FileInputStream(file));
int count=0;
while(keyboard.hasNext())
{
keyboard.next();
count++;
}
return count;
}
I want to find words of a certain length.
UPDATE
I have written the following code:
public static int WordLengthCount() throws FileNotFoundException
{
File file = new File("hello.txt");
Scanner keyboard = new Scanner(new FileInputStream(file));
int count5 = 0;
int hell = 0; //This is just for the else command to compile
while(keyboard.hasNext())
{
if ( keyboard.next().length() == 5 )
{
count5++;
keyboard.next();
return count5;
}
} return hell;
}
You can use the length() method to count the number of characters in a string (word). From there on, it's just a matter of saving it somewhere. E.g., in Map:
public static Map<Integer, Integer> lengthCounts() throws FileNotFoundException
Map<Integer, Integer> countMap = new HashMap<>();
while(keyboard.hasNext())
{
String word = keyboard.next();
int length = word.length();
Integer currCount = countMap.get(length);
if (currCount == null) {
countMap.put (length, 1);
else {
countMap.put (length, currCount + 1);
}
}
return countMap;
}
Now you could check the number of words with any particular length, or even print all of them.
EDIT:
If the only thing you need is the percentage of words of a certain length, all you need are two counters - one for the words of that length, and one for all the words:
public static double lengthPercentage(int requiredLength) throws FileNotFoundException
int allWords = 0;
int requiredWords = 0;
while(keyboard.hasNext())
{
String word = keyboard.next();
int length = word.length();
if (length == requiredLength) {
++requiredWords;
}
++allWords;
}
// implicit assumption: there's at least on word in the file
return ((double) requiredWords) / allWords;
}
File file = new File("sample.txt");
Scanner keyboard = new Scanner(new FileInputStream(file));
int count=0;
while(keyboard.hasNext())
{
keyboard.next();
// Use a hash map
// Check the string length and add it to the hash map by checking it already exists. If already exists then get the actual value from hashmap and increment it by one and save it again to the map.
count++;
}
So that your final output will be of map with one letter string count, two letter string count etc..
The other answers are great, but if you are trying to find words of a specific length in a file and you don't like the answers above, then you could also try REGEX. You can test each word and then do what you want with it. If you are looking for a count of words in a file of each length, I think the answer above is better, but if you're looking to detect a word of a specific length you could use .length() or the regex below. Using a strings .lenght() function in my opinion is better, but I'm just giving you an alternative answer and example.
I'll put a small example below.
public class Words{
public static void main(String [] args){
String [] words = {"Pizzaaa", "Pizza", "Party"};
int fives = 0;
for( String s : words){
if(s.matches(".{5}")){
5++;
}
}
System.out.println(fives);
}
}
Or a better version:
public class Words{
public static void main(String [] args){
String [] words = {"Pizzaaa", "Pizza", "Party"};
int fives = 0;
for( String s : words){
if(s.length() == 5){
5++;
}
}
System.out.println(fives);
}
}
Edited Below: To demonstrate how it can be used in a file based loop
// other code needed
while(in.hasNext())
{
String s = in.next();
if(s.length() == 5)
fives++;
}
For example, I have text file named TextFile.txt at C:\ has content:
Ut porttitor libero sodales quam sagittis, id facilisis lectus semper.
and Java code:
import java.io.BufferedInputStream;
import java.io.DataInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
public class Example {
public static void main(String[] args) throws IOException {
File file = new File("C:\\TextFile.txt");
FileInputStream fis = new FileInputStream(file);
BufferedInputStream bis = new BufferedInputStream(fis);
DataInputStream dis = new DataInputStream(bis);
if (dis.available() != 0) {
// Get the line.
String s = dis.readLine();
// Put words to array.
String[] sParts = s.split(" ");
// Initialize word longest length.
int longestLength = 1;
for (String strx : sParts) { // Go through each sPart, the next one is called strx
// If the document has word longer than.
if (longestLength < strx.length())
// Set new value for longest length.
longestLength = strx.length();
}
// Because array index from "0".
int[] counts = new int[longestLength + 1];
for (String str : sParts) {
// Add one to the number of words that length has
counts[str.length()] += 1;
}
// We use this type of loop since we need the length.
for (int i = 1; i < counts.length; i++) {
System.out.println(i + " letter words: " + counts[i]);
}
}
}
}
// Result:
// 1 letter words: 0
// 2 letter words: 2
// 3 letter words: 0
// 4 letter words: 1
// 5 letter words: 0
// 6 letter words: 2
// 7 letter words: 2
// 8 letter words: 0
// 9 letter words: 3
I'm trying to build a program with BufferedReader that reads a file and keeps track of vowels, words, and can calculate avg # of words per line. I have the skeleton in place to read the file, but I really don't know where to take it from here. Any help would be appreciated. Thanks.
import java.io.*;
public class JavaReader
{
public static void main(String[] args) throws IOException
{
String line;
BufferedReader in;
in = new BufferedReader(new FileReader("message.txt"));
line = in.readLine();
while(line != null)
{
System.out.println(line);
line = in.readLine();
}
}
}
Here's what I got. The word counting is questionable, but works for an example that I will give. Changes can be made (I accept criticism).
import java.io.*;
public class JavaReader
{
public static void main(String[] args) throws IOException
{
BufferedReader in = new BufferedReader(new FileReader("message.txt"));
String line = in.readLine();
// for keeping track of the file content
StringBuffer fileText = new StringBuffer();
while(line != null) {
fileText.append(line + "\n");
line = in.readLine();
}
// put file content to a string, display it for a test
String fileContent = fileText.toString();
System.out.println(fileContent + "--------------------------------");
int vowelCount = 0, lineCount = 0;
// for every char in the file
for (char ch : fileContent.toCharArray())
{
// if this char is a vowel
if ("aeiou".indexOf(ch) > -1) {
vowelCount++;
}
// if this char is a new line
if (ch == '\n') {
lineCount++;
}
}
double wordCount = checkWordCount(fileContent);
double avgWordCountPerLine = wordCount / lineCount;
System.out.println("Vowel count: " + vowelCount);
System.out.println("Line count: " + lineCount);
System.out.println("Word count: " + wordCount);
System.out.print("Average word count per line: "+avgWordCountPerLine);
}
public static int checkWordCount(String fileContent) {
// split words by puncutation and whitespace
String words[] = fileContent.split("[\\n .,;:&?]"); // array of words
String punctutations = ".,:;";
boolean isPunctuation = false;
int wordCount = 0;
// for every word in the word array
for (String word : words) {
// only check if it's a word if the word isn't whitespace
if (!word.trim().isEmpty()) {
// for every punctuation
for (char punctuation : punctutations.toCharArray()) {
// if the trimmed word is just a punctuation
if (word.trim().equals(String.valueOf(punctuation)))
{
isPunctuation = true;
}
}
// only add one to wordCount if the word wasn't punctuation
if (!isPunctuation) {
wordCount++;
}
}
}
return wordCount;
}
}
Sample input/output:
File:
This is a test. How do you do?
This is still a test.Let's go,,count.
Output:
This is a test. How do you do?
This is still a test.Let's go,,count.
--------------------------------
Vowel count: 18
Line count: 4
Word count: 16
Average word count per line: 4.0
You can use a Scanner to pass over the the line and retrieve every token of the string line.
line = line.replaceAll("[^a-zA-Z]", ""); //remove all punctuation
line = line.toLowerCase(); //make line lower case
Scanner scan = new Scanner(line);
String word = scan.next();
Then you could loop through each token to calculate the vowels in each word.
for(int i = 0; i < word.legnth(); i++){
//get char
char c = word.charAt(i);
//check if the char is a vowel here
if("aeiou".indexOf(c) > -1){
//c is vowel
}
}
All you need to do is set a couple of counter ints to keep track of these and you're laughing.
Ahh, if you want to make sure that there are no non-words such as " - " counting as a word, the easiest way would probably be to strip all non-alphanumeric characters out of the text.
I also added it above.
line = line.replaceAll("[^a-zA-Z]", "");
line = line.toLowerCase();
Oh and since you are new to java don't forget to import
import java.util.Scanner;
I have no idea how to start my assignment.
We got to make a Run-length encoding program,
for example, the users enters this string:
aaaaPPPrrrrr
is replaced with
4a3P5r
Can someone help me get started with it?
Hopefully this will get you started on your assignment:
The fundamental idea behind run-length encoding is that consecutively occurring tokens like aaaa can be replaced by a shorter form 4a (meaning "the following four characters are an 'a'"). This type of encoding was used in the early days of computer graphics to save space when storing an image. Back then, video cards supported a small number of colors and images commonly had the same color all in a row for significant portions of the image)
You can read up on it in detail on Wikipedia
http://en.wikipedia.org/wiki/Run-length_encoding
In order to run-length encode a string, you can loop through the characters in the input string. Have a counter that counts how many times you have seen the same character in a row. When you then see a different character, output the value of the counter and then the character you have been counting. If the value of the counter is 1 (meaning you only saw one of those characters in a row) skip outputting the counter.
public String runLengthEncoding(String text) {
String encodedString = "";
for (int i = 0, count = 1; i < text.length(); i++) {
if (i + 1 < text.length() && text.charAt(i) == text.charAt(i + 1))
count++;
else {
encodedString = encodedString.concat(Integer.toString(count))
.concat(Character.toString(text.charAt(i)));
count = 1;
}
}
return encodedString;
}
Try this one out.
This can easily and simply be done using a StringBuilder and a few helper variables to keep track of how many of each letter you've seen. Then just build as you go.
For example:
static String encode(String s) {
StringBuilder sb = new StringBuilder();
char[] word = s.toCharArray();
char current = word[0]; // We initialize to compare vs. first letter
// our helper variables
int index = 0; // tracks how far along we are
int count = 0; // how many of the same letter we've seen
for (char c : word) {
if (c == current) {
count++;
index++;
if (index == word.length)
sb.append(current + Integer.toString(count));
}
else {
sb.append(current + Integer.toString(count));
count = 1;
current = c;
index++;
}
}
return sb.toString();
}
Since this is clearly a homework assignment, I challenge you to learn the approach and not just simply use the answer as the solution to your homework. StringBuilders are very useful for building things as you go, thus keeping your runtime O(n) in many cases. Here using a couple of helper variables to track where we are in the iteration "index" and another to keep count of how many of a particular letter we've seen "count", we keep all necessary info for building our encoded string as we go.
Try this out:
private static String encode(String sampleInput) {
String encodedString = null;
//get the input to a character array.
// String sampleInput = "aabbcccd";
char[] charArr = sampleInput.toCharArray();
char prev=(char)0;
int counter =1;
//compare each element with its next element and
//if same increment the counter
StringBuilder sb = new StringBuilder();
for (int i = 0; i < charArr.length; i++) {
if(i+1 < charArr.length && charArr[i] == charArr[i+1]){
counter ++;
}else {
//System.out.print(counter + Character.toString(charArr[i]));
sb.append(counter + Character.toString(charArr[i]));
counter = 1;
}
}
return sb.toString();
}
Here is my solution in java
public String encodingString(String s){
StringBuilder encodedString = new StringBuilder();
List<Character> listOfChars = new ArrayList<Character>();
Set<String> removeRepeated = new HashSet<String>();
//Adding characters of string to list
for(int i=0;i<s.length();i++){
listOfChars.add(s.charAt(i));
}
//Getting the occurance of each character and adding it to set to avoid repeated strings
for(char j:listOfChars){
String temp = Integer.toString(Collections.frequency(listOfChars,j))+Character.toString(j);
removeRepeated.add(temp);
}
//Constructing the encodingString.
for(String k:removeRepeated){
encodedString.append(k);
}
return encodedString.toString();
}
import java.util.Scanner;
/**
* #author jyotiv
*
*/
public class RunLengthEncoding {
/**
* #param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
System.out.println("Enter line to encode:");
Scanner s=new Scanner(System.in);
String input=s.nextLine();
int len = input.length();
int i = 0;
int noOfOccurencesForEachChar = 0;
char storeChar = input.charAt(0);
String outputString = "";
for(;i<len;i++)
{
if(i+1<len)
{
if(input.charAt(i) == input.charAt(i+1))
{
noOfOccurencesForEachChar++;
}
else
{
outputString = outputString +
Integer.toHexString(noOfOccurencesForEachChar+1) + storeChar;
noOfOccurencesForEachChar = 0;
storeChar = input.charAt(i+1);
}
}
else
{
outputString = outputString +
Integer.toHexString(noOfOccurencesForEachChar+1) + storeChar;
}
}
System.out.println("Encoded line is: " + outputString);
}
}
I have tried this one. It will work for sure.
Thanks in advance.
I just solved Project Euler #22, a problem involving reading about 5,000 lines of text out of a file and determining the value of a specific name, based on the sum of that Strings characters, and its position alphabetically.
However, the code takes about 5-10 seconds to run, which is a bit annoying. What is the best way to optimize this code? I'm currently using a Scanner to read the file into a String. Is there another, more efficient way to do this? (I tried using a BufferedReader, but that was even slower)
public static int P22(){
String s = null;
try{
//create a new Scanner to read file
Scanner in = new Scanner(new File("names.txt"));
while(in.hasNext()){
//add the next line to the string
s+=in.next();
}
}catch(Exception e){
}
//this just filters out the quotation marks surrounding all the names
String r = "";
for(int i = 0;i<s.length();i++){
if(s.charAt(i) != '"'){
r += s.charAt(i);
}
}
//splits the string into an array, using the commas separating each name
String text[] = r.split(",");
Arrays.sort(text);
int solution = 0;
//go through each string in the array, summing its characters
for(int i = 0;i<text.length;i++){
int sum = 0;
String name = text[i];
for(int j = 0;j<name.length();j++){
sum += (int)name.charAt(j)-64;
}
solution += sum*(i+1);
}
return solution;
}
If you're going to use Scanner, why not use it for what it's supposed to do (tokenisation)?
Scanner in = new Scanner(new File("names.txt")).useDelimiter("[\",]+");
ArrayList<String> text = new ArrayList<String>();
while (in.hasNext()) {
text.add(in.next());
}
Collections.sort(text);
You do not need to strip quotes, or split on commas - Scanner does it all for you.
This snippet, including java startup time, executes in 0.625s (user time) on my machine. I suspect it should be a bit faster than what you were doing.
EDIT OP asked what the string passed to useDelimiter was. It's a regular expression. When you strip out the escaping required by Java to include a quote character into a string, it's [",]+ - and the meaning is:
[...] character class: match any of these characters, so
[",] match a quote or a comma
...+ one or more occurence modifier, so
[",]+ match one or more of quotes or commas
Sequences that would match this pattern include:
"
,
,,,,
""",,,",","
and indeed ",", what was what we were going after here.
I suggest you to run your code with profiler. It allows you to understand, what part is really slow (IO/computations etc). If IO is slow, check for NIO: http://docs.oracle.com/javase/1.4.2/docs/guide/nio/.
Appending strings in a loop with '+', like you do here:
/* That's actually not the problem since there is only one line. */
while(in.hasNext()){
//add the next line to the string
s+=in.next();
}
is slow, because it has to create a new string and copy everything around in each iteration. Try using a StringBuilder,
StringBuilder sb = new StringBuilder();
while(in.hasNext()){
sb.append(in.next());
}
s = sb.toString();
But, you shouldn't really read the file contents into a String, you should create a String[] or an ArrayList<String> from the file contents directly,
int names = 5000; // use the correct number of lines in the file!
String[] sa = new String[names];
for(int i = 0; i < names; ++i){
sa[i] = in.next();
}
However, upon checking, it turns out that the file does not contain about 5000 lines, rather, it is all on a single line, so your big problem is actually
/* This one is the problem! */
String r = "";
for(int i = 0;i<s.length();i++){
if(s.charAt(i) != '"'){
r += s.charAt(i);
}
}
Use a StringBuilder for that. Or, make your Scanner read until the next ',' and read directly into an ArrayList<String> and just remove the double quotes from each single name in the ArrayList.
5+ seconds is quite slow for this problem. My entire web application (600 Java classes) compiles in four seconds. The root of your problem is probably the allocation of a new String for every character in the file: r += s.charAt(i)
To really speed this up, you should not use Strings at all. Get the file size, and read the whole thing into a byte array in a single I/O call:
public class Names {
private byte[] data;
private class Name implements Comparable<Name> {
private int start; // index into data
private int length;
public Name(int start, int length) { ...; }
public int compareTo(Name arg0) {
...
}
public int score()
}
public Names(File file) throws Exception {
data = new byte[(int) file.length()];
new FileInputStream(file).read(data, 0, data.length);
}
public int score() {
SortedSet<Name> names = new ...
for (int i = 0; i < data.length; ++i) {
// find limits of each name, add to the set
}
// Calculate total score...
}
}
Depending on the application, StreamTokenizer is often measurably faster than Scanner. Examples comparing the two may be found here and here.
Addendum: Euler Project 22 includes deriving a kind of checksum of the characters in each token encountered. Rather than traversing the token twice, a custom analyzer could combine the recognition and calculation. The result would be stored in a SortedMap<String, Integer> for later iteration in finding the grand total.
An obtuse solution which may find interesting.
long start = System.nanoTime();
long sum = 0;
int runs = 10000;
for (int r = 0; r < runs; r++) {
FileChannel channel = new FileInputStream("names.txt").getChannel();
ByteBuffer bb = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
TLongArrayList values = new TLongArrayList();
long wordId = 0;
int shift = 63;
while (true) {
int b = bb.remaining() < 1 ? ',' : bb.get();
if (b == ',') {
values.add(wordId);
wordId = 0;
shift = 63;
if (bb.remaining() < 1) break;
} else if (b >= 'A' && b <= 'Z') {
shift -= 5;
long n = b - 'A' + 1;
wordId = (wordId | (n << shift)) + n;
} else if (b != '"') {
throw new AssertionError("Unexpected ch '" + (char) b + "'");
}
}
values.sort();
sum = 0;
for (int i = 0; i < values.size(); i++) {
long wordSum = values.get(i) & ((1 << 8) - 1);
sum += (i + 1) * wordSum;
}
}
long time = System.nanoTime() - start;
System.out.printf("%d took %.3f ms%n", sum, time / 1e6);
prints
XXXXXXX took 27.817 ms.