I'm trying to read from multiple .txt files in a directory using a scanner in Java.
So far, I have
File directory = new File("textanalyzer/Shakespeare");
File[] filenames = directory.listFiles();
Scanner scanner = new Scanner(new File(filenames)).useDelimiter("[^a-zA-Z<]+");
The rest of my program uses the text from these files. I have the rest of the program written but I'm stuck on this one thing.
I've been looking around for a solution, but I can't really find anything. I know that what I have isn't very good but I don't know enough Java to be able to improve it. I've also tried using Apache imports but I can't figure out how to make them work (FileIterator, in particular).
Finally, I would really like to use the Scanner class so that I can use the Delimiter. It is super helpful for what I'm trying to do.
Not quite sure what your goal is but this basic example might help.
File[] fileArray=new File("textanalyzer/Shakespeare").listFiles();
for(File f: fileArray) // loop thru all files
{
if(f.getName().endsWith(".txt")) // to deal with the .txt files.
{
Scanner s=new Scanner(f); // to read the files
}
}
Related
I have this text file of the format:
Token:A1
sometext
Token:A2
sometext
Token:A3
I want to split this file into multiple files, such that
File 1 contains
A1
sometext
File 2 contains
A2
sometext
I do not have much idea about any programming or scripting language as such, what would be the best way to go about the process? I was thinking of using Java to solve the problem.
if you want to use java, I would look into using Scanner in conjunction with File and PrintWriter with a for loop and some exception handling you will be good to go.
import the proper libraries!
import java.io.*;
import java.util.*;
declare the class of course
public class someClass{
public static void main(String [] args){
now here's where stuff starts to get interesting. We use the class File to create a new file that has the name of the file to be read passed as a parameter. You can put whatever you want there whether its a path to the file or just the file name if its in the same directory as your code.
File currentFile = new File("new.txt");
if (currentFile.exists() && currentFile.canRead()){
try{
next we create a scanner to scan through that newly created File object. the for loop continues on as long as the file has new tokens to scan through. .hasNext() returns true only if the input in the scanner has another token. PrintWriter writes and creates the files. I have it set that it will create the files based on the iteration of the loop (0,1,2,3 etc) but that can be easily changed. (see new PrintWriter(i + ".txt". UTF-8); )
Scanner textContents = new Scanner(currentFile);
for(int i = 0; textContents.hasNext(); i++){
PrintWriter writer = new PrintWriter(i + ".txt", "UTF-8");
writer.println(textContents.next());
writer.close();
}
these catch statements are super important! Your code wont even compile without them. If there is an error they will make sure your code doesn't crash. I left the inside of them empty so you can do what you see fit.
} catch (FileNotFoundException e) {
// do something
}
catch (UnsupportedEncodingException i){
//do something
}
}
}
}
and thats pretty much it! if you have any questions be sure to comment!
There is no best way and it depends on your environment and need actually. But for any language figure out your basic algorithm and try using the best available data structure(s). If you are using Java, consider using guava splitter and do look into its implementation.
I found the following useful in the past for reading in text files:
new Scanner(file).useDelimiter("\\Z").next();
However I came across a file today that was only partially read in with this syntax. I'm not sure what makes this file special, it's just a .jsp
I found the below worked in this instance but I'd like to know why the previous method didn't work.
Scanner in = new Scanner(new FileReader(file));
String text = in.useDelimiter("\\Z").next();
Save the jsp file as .txt and try to read it using your first method. if it works i feel size can be the issue.
I have 80,000 words for a crossword (among others) puzzle word pattern matcher. (User inputs "ba??" and gets, among other things, "ball, baby, bank, ..." or enters "ba*" and gets the aforementioned as well as "bat, basket, babboon...".)
I stuck the words in a Netbeans "empty file" and named it "dictionary". The file's contents are just (80,000) words, one per line. This code works like a charm to read the dictionary (code that filters is omitted):
static void showMatches(String pattern, String legal, String w) throws IOException
{
Path p = Paths.get("C:\\Users\\Dov\\Documents\\NetBeansProjects\\Masterwords\\src\\masterwords\\dictionary");
String word;
Scanner sc = new Scanner(p).useDelimiter("\r");
while(sc.hasNext()){
word = sc.next().substring(1);
gui.appendOutput(word);
}
sc.reset();
}
Is there a way to make the file (named "dictionary") become part of the compiled jar file so that I only need to "ship" one file to new, (largely helpless) users?
In another matter of curiosity...
Is it possible to make the argument to Paths.get(...) something like "masterwords/src/dictionary" to make the connection for the Scanner object to be able read it? I'm wondering if this might relate to an answer my first question. (If there's a way, I can't stumble onto it. Whatever similar string I use, I get no error, no output, no "build successful"--gotta click Run > Stop build/run.)
I'm not certain, based on your description, that my solution addresses your issue, but let me restate the problem as I understand it: You have a .jar file that relies on a dictionary resource. That resource is subject to change, and you'd like to be able to update it without having to ship out a whole new .jar containing a new dictionary.
If I'm reading you correctly, you want something like:
private File getInstallPath()
{
return new File(MyClass.class.getProtectionDomain().getCodeSource().getLocation().getPath());
}
This will return the install directory of your .jar file, which is where you can put your dictionary resource so that the .jar knows where to find it. Of course, now you have a bit of a training issue, because users can move, delete or misplace your dictionary file.
Part II:
Now that you've clarified your question, let me again restate: You want to be able to read an arbitrary file included in your .jar file. Fine. You're probably trying to open the file as a file, but once the file is in your .jar, you need to treat it as a resource.
Try using:
Class myClass = Class.forName("MyClass");
ClassLoader myLoader = myclass.getClassLoader();
InputStream myStream = myLoader.getResourceAsStream(myFile);
Do you really need me to explain what "myClass," "myLoader," etc. refer to? Hint: "myClass" is whatever your class is that needs to read the file.
After leaving this thread in frustration for a couple of weeks, yesterday I found a similar question at this forum, which led me to Google "java resource files" and visit ((this URL)).
Between the two I figured out how to read a file named 'dictionary' that was created as a Netbeans "empty Java file", which was located in Source Packages ... [default package] (as shown in Netbeans Projects window) and stored as C:\Users\Dov\!Docs\Documents\NetBeansProjects\WordPatternHelp\src\dictionary:
File file = new File("src/dictionary");
...
p = file.toPath();
sc = new Scanner(p).useDelimiter("\r");
Success. Hooray.
But after compiling and executing the .jar file from a DOS command line, 'dictionary' couldn't be found. So the above only works from within Netbeans IDE.
After mostly erroneous attempts caused by the above 'success', I finally got success using #Mars' second suggestion like so:
package masterwords;
public class Masterwords
...
InputStream myStream = Class.forName("masterwords.Masterwords").
getClassLoader().getResourceAsStream("dictionary");
sc = new Scanner(myStream).useDelimiter("\r"); // NULL PTR EXCEPTION HERE
So, for whatever it might be worth, a very belated thanks (and another apology) to #Mars. It was as straightforward as he indicated. Wish I'd tried it 2 weeks ago, but I'd never seen any of the methods and didn't want to take the time to learn how they work back then with other more pressing issues at hand. So I had no idea Mars had actually written the exact code I needed (except for the string arguments). Boy, do I know how the methods work now.
I'm not looking for any answers that involve opening the zip file in a zip input or output stream. My question is is it possible in java to just simply open a jar file like any other file (using buffered reader/writer), read it's contents, and write them somewhere else? For example:
import java.io.*;
public class zipReader {
public static void main(String[] args){
BufferedReader br = new BufferedReader(new FileReader((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/foo.zip")));
BufferedWriter bw = new BufferedWriter(new FileWriter((System.getProperty("user.home").replaceAll("\\\\", "/") + "/Desktop/baf.zip")));
char[] ch = new char[180000];
while(br.read(ch) > 0){
bw.write(ch);
bw.flush();
}
br.close();
bw.close();
}
}
This works on some small zip/jar files, but most of the time will just corrupt them making it impossible to unzip or execute them. I have found that setting the size of the char[] to 1 will not corrupt the file, just everything in it, meaning I can open the file in an archive program but all it's entries will be corrupted and unusable. Does anyone know how to write the above code so it won't corrupt the file? Also here is a line from a jar file I tested this on that became corrupted:
nèñà?G¾Þ§V¨ö—‚?‰9³’?ÀM·p›a0„èwåÕüaEܵp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM?€ÔÙÆC†ÑÝ×ok?ý—¥úûFs.‡
vs the original:
nèñàG¾Þ§V¨ö—‚‰9³’ÀM·p›a0„èwåÕüaEܵp‡aæOùR‰(JºJ´êgžè*?”6ftöãÝÈ—ê#qïc3âi,áž…¹¿Êð)V¢cã>Ê”G˜(†®9öCçM€ÔÙÆC†ÑÝ×oký—¥úûFs.‡
As you can see either the reader or writer adds ?'s into the files and I can't figure out why. Again I don't want any answers telling me to open it entry by entry, I already know how to do that, if anyone knows the answer to my question please share it.
Why would you want to convert binary data to chars? I think it will be much better to InputStream/OutputStream using byte arrays. See http://www.javapractices.com/topic/TopicAction.do?Id=245
for examples.
bw.write(ch) will write the entire array. Read will only fill in some of it, and return a number telling you how much. This is nothing to do with zip files, just with how IO works.
You need to change your code to look more like:
int charsRead = br.read(buffer);
if (charsRead >= 0) {
bw.write(buffer, 0, charsRead);
} else {
// whatever I do at the end.
}
However, this is only 1/2 of your problem. You are also converting bytes to characters and back again, which will corrupt the data in other ways. Stick to streams.
see the ZipInputStream and ZipOutputStream classes
Edit: use plain FileInputStream and FileOutputStream. I suspect there may be some issues when the reader is interpreting the bytes as characters.
see also: Standard concise way to copy a file in Java? Since you ant to copy the whole file, there is nothing special about it being a zip file
I'm using Eclipse (SDK v4.2.2) to develop a Java project (Java SE, v1.6) that currently reads information from external .txt files as part of methods used many times in a single pass. I would like to include these files in my project, making them "native" to make the project independent of external files. I don't know where to add the files into the project or how to add them so they can easily be used by the appropriate method.
Searching on Google has not turned up any solid guidance, nor have I found any similar questions on this site. If someone knows how to do add files and where they should go, I'd greatly appreciate any advice or even a point in the right direction. Also, if any additional information about the code or the .txt files is required, I'll be happy to provide as much detail as possible.
UPDATE 5/20/2013: I've managed to get the text files into the classpath; they're located in a package under a folder called 'resc' (per dharam's advice), which is on the same classpath level as the 'src' folder in which my code is packaged. Now I just need to figure out how to get my code to read these files properly. Specifically, I want to read a selected file into a two-dimensional array, reading line-by-line and splitting each line by a delimiter. Prior to packaging the files directly within the workspace, I used a BufferedReader to do this:
public static List<String[]> fileRead(String d) {
// Initialize File 'f' with path completed by passed-in String 'd'.
File f = new File("<incomplete directory path goes here>" + d);
// Initialize some variables to be used shortly.
String s = null;
List<String> a = new ArrayList<String>();
List<String[]> l = new ArrayList<String[]>();
try {
// Use new BufferedReader 'in' to read in 'f'.
BufferedReader in = new BufferedReader(new FileReader(f));
// Read the first line into String 's'.
s = in.readLine();
// So long as 's' is NOT null...
while(s != null) {
// Split the current line, using semi-colons as delimiters, and store in 'a'.
// Convert 'a' to array 'aSplit', then add 'aSplit' to 'l'.
a = Arrays.asList(s.split("\\s*;\\s*"));
String[] aSplit = a.toArray(new String[2]);
l.add(aSplit);
// Read next line of 'f'.
s = in.readLine();
}
// Once finished, close 'in'.
in.close();
} catch (IOException e) {
// If problems occur during 'try' code, catch exception and include StackTrace.
e.printStackTrace();
}
// Return value of 'l'.
return l;
}
If I decide to use the methods described in the link provided by Pangea (using getResourceAsStream to read in the file as an InputStream), I'm not sure how I would be able to achieve the same results. Would someone be able to help me find a solution on this same question, or should I ask about that issue into a different question to prevent headaches?
You can put them anywhere you wish, but depends on what you want to achieve through putting the file.
A general practice is to create a folder with name resc/resource and put files in it. Include the folder in classpath.
You can store the files within a java package and read them as classpath resources. For e.g. you can add the text files to a java package say com.foo and use this thread to know how to read them: How to really read text file from classpath in Java
This way they are independent of the environment and are co-packaged with code itself.
Add the files in the projects classpath.(you can find the class path of the project by right click the project in eclipse->Build Path->configure build path)
I guess you want an internal .txt file.
Package Explorer => Right Click at your project => New => File . Then text a file name and Finish it.
The path in your code should look like this:
Scanner diskScanner = new Scanner(new File("YourFile"));