Removing all non alphanumeric characters in java

Removing all non alphanumeric characters in java - java

This a program which presents how many times does each word occur within a text file. what is going on is that its also picking up characters like ? and , i only want it to pick letters. This is just part of the results {"1"=1, "Cheers"=1, "Fanny"=1, "I=1, "biscuits"=1, "chairz")=1, "cheeahz"=1, "crisps"=1, "jumpers"=1, ?=20, work:=1
import java.io.File;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.IOException;
import java.util.TreeMap;
import java.util.StringTokenizer;
public class Unigrammodel {
public static void main(String [] args){
//Creating BufferedReader to accept the file name from the user
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
String fileName = null;
System.out.print("Please enter the file name with path: ");
try{
fileName = (String) br.readLine();
//Creating the BufferedReader to read the file
File textFile = new File(fileName);
BufferedReader input = new BufferedReader(new FileReader(textFile));
//Creating the Map to store the words and their occurrences
TreeMap<String, Integer> frequencyMap = new TreeMap<String, Integer>();
String currentLine = null;
//Reading line by line from the text file
while((currentLine = input.readLine()) != null){
//Parsing the words from each line
StringTokenizer parser = new StringTokenizer(currentLine);
while(parser.hasMoreTokens()){
String currentWord = parser.nextToken();
//remove all non-alphanumeric from this word
currentWord.replaceAll(("[^A-Za-z0-9 ]"), "");
Integer frequency = frequencyMap.get(currentWord);
if(frequency == null){
frequency = 0;
}
//Putting each word and its occurrence into Map
frequencyMap.put(currentWord, frequency + 1);
}
}
//Displaying the Result
System.out.println(frequencyMap +"\n");
}catch(IOException ie){
ie.printStackTrace();
System.err.println("Your entered path is wrong");
}
}
}

Strings are immutable, so you need to assign the modified string to a variable before adding it to the map.
String wordCleaned= currentWord.replaceAll(("[^A-Za-z0-9 ]"), "");
...
frequencyMap.put(wordCleaned, frequency + 1);

Related

StringBuffer: Adding a newline after a certain amount of words for formatting

Okay so my problem is formatting the output of my program. My program is meant to be a madlib, It reads in a file and then allows the user to enter nouns, adjectives, plurals etc. and then it prints the madlib back out with the updated version of what the user entered.
Here's my text file:
One of the most adjective characters in fiction is named "Tarzan of the plural-noun." Tarzan was raised by a/an noun and lives in the adjective jungle in the heart of darkest place. He spends most of this time eating plural-noun and swinging from tree to noun. Whenever he gets angry, he beats on his chest and says, "funny-noise !" This is his war cry. Tarzan always dresses in adjective shorts made from the skin of a/an noun and his best friend is a/an adjective chimpanzee names Cheetah. He is supposed to be able to speak to elephants and plural-noun. In the movies, Tarzan is played by person's-name.
The tokens I scan for in the file are these <> (I didn't show them in the text file above but where it says adjective or noun or funny noises its really < adjective > with no space between left arrow adjective and right arrow adjective) and that's where the users inputs are placed. Everything in my program works except until I print it out. Instead of printing the madlib out in the format above, it just prints it out in one long line. It doesn't have to match the above format, I'd just like it to print a newline after a length of 50 for instance just so it's easier to read.
Here's my code:
import java.util.Map;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Scanner;
import java.util.StringTokenizer;
public class ReadFile
{
public static void main(String[] args) //throws Exception
{
Scanner s=new Scanner(System.in);
BufferedReader br = new BufferedReader(new FileReader(args[0]));
String line;
StringBuffer storybuffer=new StringBuffer();
//Accept lines until next line null
while((line=br.readLine()) != null)
storybuffer.append(" "+line);
//Remove first space
storybuffer.delete(0, 1);
String story=storybuffer.toString();
//Split
StringTokenizer str=new StringTokenizer(story);
String word;
StringBuffer finalstory=new StringBuffer();
//Store added elements
Map<String,String> hash=new HashMap<String,String>();
while(str.hasMoreTokens())
{
word=str.nextToken();
if(word.contains("<"))
{
String add="";
//Element prompt could be more than one word
if(!word.contains(">"))
{
//Build multi-word prompt
String phrase="";
do{
phrase+=word+" ";
}while(!(word=str.nextToken()).contains(">"));
word=phrase+word;
}
//Account for element placeholder being immediately followed by . or , or whatever.
if(word.charAt(word.length()-1)!='>')
add=word.substring(word.lastIndexOf('>')+1);
//Store id of element in hash table
String id=word.substring(0,word.lastIndexOf('>')+1);
String value;
if(!hash.containsKey(id))
{
//New element
System.out.println("Enter a "+ id);
value=s.nextLine()+add;
hash.put(id, value);
}
//Previously entered element heres the problem for duplicates!
else
value=hash.get(id);
word=value;
}
finalstory.append(word+" ");
// if(finalstory.length() > 50){
// finalstory.append("\n");
}
System.out.println(finalstory.toString());
s.close();
}
}
Anyone have any ideas of how to fix this?

import java.util.Map;
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.HashMap;
import java.util.Scanner;
import java.util.StringTokenizer;
public class ReadFile {
public static void main(String[] args) //throws Exception
{
Scanner s = new Scanner(System.in);
BufferedReader br = new BufferedReader(new FileReader(args[0]));
String line;
StringBuffer storybuffer = new StringBuffer();
//Accept lines until next line null
while ((line = br.readLine()) != null)
storybuffer.append(" " + line);
//Remove first space
storybuffer.delete(0, 1);
String story = storybuffer.toString();
//Split
StringTokenizer str = new StringTokenizer(story);
String word;
StringBuffer finalstory = new StringBuffer();
//Store added elements
Map < String, String > hash = new HashMap < String, String > ();
while (str.hasMoreTokens()) {
word = str.nextToken();
if (word.contains("<")) {
String add = "";
//Element prompt could be more than one word
if (!word.contains(">")) {
//Build multi-word prompt
String phrase = "";
do {
phrase += word + " ";
} while (!(word = str.nextToken()).contains(">"));
word = phrase + word;
}
//Account for element placeholder being immediately followed by . or , or whatever.
if (word.charAt(word.length() - 1) != '>')
add = word.substring(word.lastIndexOf('>') + 1);
//Store id of element in hash table
String id = word.substring(0, word.lastIndexOf('>') + 1);
String value;
if (!hash.containsKey(id)) {
//New element
System.out.println("Enter a " + id);
value = s.nextLine() + add;
hash.put(id, value);
}
//Previously entered element
else
value = hash.get(id);
word = value;
}
finalstory.append(word + " ");
// if(finalstory.length() > 50){
// finalstory.append("\n");
}
System.out.println(finalstory.toString());
s.close();
}
}

Excluding header from.csv

I have a .csv file which has header that I would like to be skipped. I get error when the header is present in the .csv file but when it is removed program runs perfectly fine. I would like my code to skip the header and continue on with the process.
What the .csv files looks like:
Make Model Speed Fuel BaseMPG ScaleFactor Time Travelled
Ford Mustang 0 20.2 20 0.02 2.3
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
public class Test {
public static void main(String[] args) throws IOException {
List<Vehicle> cars = new ArrayList<Vehicle>();
Scanner scanner = new Scanner(System.in);
System.out.println("Enter the file name:");
String filename = scanner.nextLine();
BufferedReader reader = new BufferedReader(new FileReader(new File(
filename.trim())));
String line = "";
while ((line = reader.readLine()) != null) {
String[] words = line.split(",");
String make = words[0];
String model = words[1];
int currentSpeed = Integer.parseInt(words[2]);
double fuel = Double.parseDouble(words[3]);
double baseMpg = Double.parseDouble(words[4]);
double scaleFactor = Double.parseDouble(words[5]);
double timeTravelled = Double.parseDouble(words[6]);
Vehicle car = new Car(fuel, currentSpeed, baseMpg, scaleFactor,
make, model, timeTravelled);
System.out.println(car);
cars.add(car);
}
FileWriter writer=new FileWriter(new File("ProcessedCars.txt"));
for(Vehicle car:cars)
{
writer.write(car.toString());
writer.flush();
writer.write("\r\n");
}
}
}

Skip the first line in your while loop:
boolean skip = true;
while ((line = reader.readLine()) != null) {
if(skip) {
skip = false; // Skip only the first line
continue;
}
String[] words = line.split(",");
// ...
}

One way to do it is to catch the exception:
try{
int currentSpeed = Integer.parseInt(words[2]);
// ...
}catch(NumberFormatException e){
// Failed to parse speed, input is likely a text, like header
}
Or, if you are sure there is a header, just call an extra readline() before your loop.

Java Extracting values from text files

I have many text files (up to 20) and each file has it's contents like this
21.0|11|1/1/1997
13.3|12|2/1/1997
14.6|9|3/1/1997
and every file has approximately more than 300 lines.
so the problem I'm facing is this, how can I extract all and only the first values
of the file's content.
for example I want to extract the values (21.0,13.3,14.6.....etc) so I can decide the max number and minimum in all of the 20 files.
I have wrote this code from my understanding to experience it on of the files
but it didn't work
String inputFileName = "Date.txt";
File inputFile = new File(inputFileName);
Scanner input = new Scanner(inputFile);
int count = 0;
while (input.hasNext()){
double line = input.nextDouble(); //Error occurs "Exception in thread "main" java.util.InputMismatchException"
count++;
double [] lineArray= new double [365];
lineArray[count]= line;
System.out.println(count);
for (double s : lineArray){
System.out.println(s);
System.out.println(count);
and this one too
String inputFileName = "Date.txt";
File inputFile = new File(inputFileName);
Scanner input = new Scanner(inputFile);
while (input.hasNext()){
String line = input.nextLine();
String [] lineArray = line.split("//|");
for (String s : lineArray){
System.out.println(s+" ");
}
Note: I'm still kind of a beginner in Java
I hope I was clear and thanks

For each line of text, check whether it contains the pipe character. If it does, grab the first portion of the text and parse it to double.
double val = 0.0;
Scanner fScn = new Scanner(new File(“date.txt”));
while(fScn.hasNextLine()){ //Can also use a BufferedReader
data = fScn.nextLine();
if(data.contains("|")) //Ensure line contains "|"
val = Double.parseDouble(data.substring(0, data.indexOf("|"))); //grab value
}

Or you could try some streams, cool stuff
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
public class MinMaxPrinter {
public static void main(String[] args) {
final List<String> files = Arrays.asList("file", "names", "that", "you", "need");
new MinMaxPrinter().printMinMax(files);
}
public void printMinMax(List<String> fileNames) {
List<Double> numbers = fileNames.stream()
.map(Paths::get)
.flatMap(this::toLines)
.map(line -> line.split("\\|")[0])
.map(Double::parseDouble)
.collect(Collectors.toList());
double max = numbers.stream().max(Double::compare).get();
double min = numbers.stream().min(Double::compare).get();
System.out.println("Min: " + min + " Max: " + max);
}
private Stream<String> toLines(Path path) {
try {
return Files.lines(path);
} catch (IOException e) {
return Stream.empty();
}
}
}

try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
String res = s.split("\\|")[0];
}
}

Buffered Reader read certain line and text

This is my first post, so i'm not sure how things work here.
Basically, i need some help/advice with my code. The method need to read a certain line and print out the text after the inputted text and =
The text file would like
A = Ant
B = Bird
C = Cat
So if the user it input "A" it should print out something like
-Ant
So far, i manage to make it ignore "=" but still print out the whole file
here is my code:
public static void readFromFile() {
System.out.println("Type in your word");
Scanner scanner = new Scanner(System.in);
String input = scanner.next();
String output = "";
try {
FileReader fr = new FileReader("dictionary.txt");
BufferedReader br = new BufferedReader(fr);
String[] fields;
String temp;
while((input = br.readLine()) != null) {
temp = input.trim();
if (temp.startsWith(input)) {
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
}
System.out.print("-"+output);
}
catch(IOException e) {
}
}

It looks like this line is the problem, as it will always be true.
if (temp.startsWith(input))
You need to have a different variables for the lines being read out of the file and for the input you're holding from the user. Try something like:
String fileLine;
while((fileLine = br.readLine()) != null)
{
temp = fileLine.trim();
if (temp.startsWith(input))
{
String[] splitted = temp.split("=");
output += splitted[1] + "\n";
}
}

You can use useDelimiter() method of Scanner to split input text
scanner.useDelimiter("(.)*="); // Matches 0 or more characters followed by '=', and then gives you what is after `=`
The following code is something I've tried in IDEONE (http://ideone.com/TBwCFj)
Scanner s = new Scanner(System.in);
s.useDelimiter("(.)*=");
while(s.hasNext())
{
String ss = s.next();
System.out.print(ss);
}
/**
* Output
*
*/
Ant
Bat

You need to first split the text file by new line "\n" (assuming after each "A = Ant", "B = Bird" ,"C = Cat" declaration it starts with a new line) and THEN locate the inputted character and further split that by "=" as you were doing.
So you will need two arrays of Strings (String[ ]) one for each line and one for the separation of each line into e.g. "A" and "Ant".
You are very close.

try this, it works: STEPS:
1) read input using scanner
2) read file using bufferedreader
3) split each line using "-" as a delimiter
4) compare first character of line with input
5) if first character is equal to input then print the associated value, preceded by a "-"
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.File;
import java.util.Scanner;
class myRead{
public static void main(String[] args) throws FileNotFoundException, IOException {
System.out.println("Type in your word");
Scanner scanner = new Scanner(System.in);
String input = scanner.next();
long numberOfLines = 0;
BufferedReader myReader = new BufferedReader(new FileReader("test.txt"));
String line = myReader.readLine();
while(line != null){
String[] parts = line.split("=");
if (parts[0].trim().equals(input.trim())) {
System.out.println("-"+parts[1]);
}
line = myReader.readLine();
}
}
}
OUTPUT (DEPENDING ON INPUT):
- Ant
- Bird
- Cat

How to tokenize an input file in java

i'm doing tokenizing a text file in java. I want to read an input file, tokenize it and write a certain character that has been tokenized into an output file. This is what i've done so far:
package org.apache.lucene.analysis;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StreamTokenizer;
class StringProcessing {
// Create BufferedReader class instance
public static void main(String[] args) throws IOException {
InputStreamReader input = new InputStreamReader(System.in);
BufferedReader keyboardInput = new BufferedReader(input);
System.out.print("Please enter a java file name: ");
String filename = keyboardInput.readLine();
if (!filename.endsWith(".DAT")) {
System.out.println("This is not a DAT file.");
System.exit(0);
}
File File = new File(filename);
if (File.exists()) {
FileReader file = new FileReader(filename);
StreamTokenizer streamTokenizer = new StreamTokenizer(file);
int i = 0;
int numberOfTokensGenerated = 0;
while (i != StreamTokenizer.TT_EOF) {
i = streamTokenizer.nextToken();
numberOfTokensGenerated++;
}
// Output number of characters in the line
System.out.println("Number of tokens = " + numberOfTokensGenerated);
// Output tokens
for (int counter = 0; counter < numberOfTokensGenerated; counter++) {
char character = file.toString().charAt(counter);
if (character == ' ') { System.out.println(); } else { System.out.print(character); }
}
} else {
System.out.println("File does not exist!");
System.exit(0);
}
System.out.println("\n");
}//end main
}//end class
When i run this code, this is what i get:
Please enter a java file name: D://eclipse-java-helios-SR1-win32/LexractData.DAT
Number of tokens = 129
java.io.FileReader#19821fException in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: 25
at java.lang.String.charAt(Unknown Source)
at org.apache.lucene.analysis.StringProcessing.main(StringProcessing.java:40)
The input file will look like this:
-K1 Account
--Op1 withdraw
---Param1 an
----Type Int
---Param2 amount
----Type Int
--Op2 deposit
---Param1 an
----Type Int
---Param2 Amount
----Type Int
--CA1 acNo
---Type Int
-K2 CheckAccount
--SC Account
--CA1 credit_limit
---Type Int
-K3 Customer
--CA1 name
---Type String
-K4 Transaction
--CA1 date
---Type Date
--CA2 time
---Type Time
-K5 CheckBook
-K6 Check
-K7 BalanceAccount
--SC Account
I just want to read the string which are starts with -K1, -K2, -K3, and so on... can anyone help me?

The problem is with this line --
char character = file.toString().charAt(counter);
file is a reference to a FileReader that does not implement toString() .. it calls Object.toString() which prints a reference around 25 characters long. Thats why your exception says OutofBoundsException at the 26th character.
To read the file correctly, you should wrap your filereader with a bufferedreader and then put each readline into a stringbuffer.
FileReader fr = new FileReader(filename);
BufferedReader br = new BufferedReader(fr);
StringBuilder sb = new StringBuilder();
String s;
while((s = br.readLine()) != null) {
sb.append(s);
}
// Now use sb.toString() instead of file.toString()

If you are wanting to tokenize the input file then the obvious choice is to use a Scanner. The Scanner class reads a given input stream, and can output either tokens or other scanned types (scanner.nextInt(), scanner.nextLine(), etc).
import java.util.Scanner;
import java.io.File;
import java.io.IOException;
public static void main(String[] args) throws IOException {
Scanner in = new Scanner(new File("filename.dat"));
while (in.hasNext) {
String s = in.next(); //get the next token in the file
// Now s contains a token from the file
}
}
Check out Oracle's documentation of the Scanner class for more info.

public class FileTokenize {
public static void main(String[] args) throws IOException {
final var lines = Files.readAllLines(Path.of("myfile.txt"));
FileWriter writer = new FileWriter( "output.txt");
String data = " ";
for (int i = 0; i < lines.size(); i++) {
data = lines.get(i);
StringTokenizer token = new StringTokenizer(data);
while (token.hasMoreElements()) {
writer.write(token.nextToken() + "\n");
}
}
writer.close();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Removing all non alphanumeric characters in java - java

Strings are immutable, so you need to assign the modified string to a variable before adding it to the map. String wordCleaned= currentWord.replaceAll(("[^A-Za-z0-9 ]"), ""); ... frequencyMap.put(wordCleaned, frequency + 1);

Related

StringBuffer: Adding a newline after a certain amount of words for formatting

Excluding header from.csv

Java Extracting values from text files

Buffered Reader read certain line and text

How to tokenize an input file in java

Categories

Resources