I have this code to search a document and save the sentences to a ArrayList<StringBuffer> and save this object in a file
public static void save(String doc_path) {
StringBuffer text = new StringBuffer(new Corpus().createDocument(doc_path + ".txt").getDocStr());
ArrayList<StringBuffer> lines = new ArrayList();
Matcher matcher = compile("(?<=\n).*").matcher(text);
while (matcher.find()) {
String line_str = matcher.group();
if (checkSentenceLine(line_str)){
lines.add(new StringBuffer(line_str));
}
}
FilePersistence.save (lines, doc_path + ".lin");
FilePersistence.save (lines.toString(), doc_path + "_extracoes.txt");
}
Corpus
public Document createDocument(String file_path) {
File file = new File(file_path);
if (file.isFile()) {
return new Document(file);
} else {
Message.displayError("file path is not OK");
return null;
}
}
FilePersistence
public static void save (Object object_root, String file_path){
if (object_root == null) return;
try{
ObjectOutputStream output = new ObjectOutputStream(new FileOutputStream (file_path));
output.writeObject(object_root);
output.close();
} catch (Exception exception){
System.out.println("Fail to save file: " + file_path + " --- " + exception);
}
}
public static Object load (String file_path){
try{
ObjectInputStream input = new ObjectInputStream(new FileInputStream (file_path));
Object object_root = input.readObject();
return object_root;
}catch (Exception exception){
System.out.println("Fail to load file: " + file_path + " --- " + exception);
return null;
}
}
the problem is, the document has some right single quotation characters as apostrophes, and when I load it
and print on screen I get some odd squares instead of apostrophes on netBeans and Â' if I open the file on notepad and this is preventing me to properly handle the extracted sentences or at least showing them properly. At first I thought it was due to encoding incompatibility.
Then I tried changing encoding on project properties to CP1252 but it only changes the blank squares to question marks and on notepad still the same Â'
I also tried using
String line_str = matcher.group().replace("’","'")
and
String line_str = matcher.group().replace('\u2019','\')
but it does nothing
Update:
if (checkSentenceLine(line_str)){
System.out.println(line_str);
lines.add(new StringBuffer(line_str));
}
This is before saving to a binary file. It already mess up the single quotes. shows as blank squares in UTF8 and as ? in CP1252. Makes me think the problem is when reading from the .txt
weird thing is that if i do this:
System.out.println('\u2019');
shows a perfect right single quote. the problem is only when reading from a .txt file, which makes me think it's a problem with the method I'm using to read from file. It also happens to bullet point symbols.
Maybe the problem is when parsing StringBuffer to String? if so, how could I prevent this from happening?
Related
I am trying to read a .csv file in a Java program. The file has some cells which contain multiple lines.
I am on a linux OS, so I tried removing the line breaks with the following:
awk -v RS="" '{gsub (/\n/,"")}1' cleanPaperAuthor.csv > cleanPaperAuthor1.csv
That DID result in the multi-line data in the cell being displayed all on one line. But when I attempted to read in the file in java, the reader still thought that it had encountered the end of the line in the middle of the cell data.
So I tried
awk -v RS="" '{gsub (/\r/,"")}1' cleanPaperAuthor1.csv > cleanPaperAuthor2.csv
That resulted in ALL data in the .csv file being put on one line.
So then I tried
awk -v RS="" '{gsub (/\r\n/,"")}1' cleanPaperAuthor.csv > cleanPaperAuthor3.csv.
I'm not sure yet if that worked - I am still in the process of opening the file.
I know there is a CSVReader class out there, but I would really like to figure out what I can do without having to deal with getting that set up and changing my code. Anyone out there have any ideas? I'm completely befuddled at this point.
Using a CSV parser is extremely easy; both the setup and the API. And, in addition to handling the values that span multiple lines it can take care of things like commas in quoted elements and parsing just the values inside the quotes "" etc. for you. Plus, you can use the library to serialize your text back to CSV as well.
Here's an example with OpenCSV to read a line of csv values.
String input = "value1, \"value2\", \"value3, 1234\", \"value4\n"
+ "value5\n"
+ "value6\"";
try (CSVReader reader = new CSVReader(new StringReader(input))) {
String [] tokens;
while ((tokens = reader.readNext()) != null) {
System.out.println(Arrays.toString(tokens));
}
} catch (IOException e) {
e.printStackTrace();
}
Output : ("value3, 1234" is one value.)
[value1, value2, value3, 1234, value4
value5
value6]
Just make sure to add Apache Commons Lang 3.x jar to your classpath.
String UPLOADED_FOLDER = "/home/Rahul/Developement/Rahul/personal/uploadedfile/";
try {
// ** get the file and store at to that location **
byte[] bytes = file.getBytes();
Path path = Paths.get(UPLOADED_FOLDER + file.getOriginalFilename());
Files.write(path, bytes);
redirectAttributes.addFlashAttribute("You successfully uploaded '" + file.getOriginalFilename() + "'");
} catch (IOException e) {
e.printStackTrace();
}
try {
String fileName = file.getOriginalFilename();
System.out.println("/home/Rahul/Developement/Rahul/personal/uploadedfile/" + fileName);
String filePath = new File("/home/Rahul/Developement/Rahul/personal/uploadedfile/")
.getAbsolutePath();
boolean check = true;
File file1 = new File("/home/Rahul/Developement/Rahul/personal/uploadedfile/" + fileName);
System.out.println(file1.exists());
// TO CHECK FILE IS CSV OR NOT
if (fileName.endsWith(".csv")) {
check = true;
System.out.println("extension");
if (!fileName.isEmpty()) {
// *** to read the file from the location
// **("/home/Rahul/Developement/Rahul/personal/uploadedfile/")**
BufferedReader br = new BufferedReader(new FileReader(
"/home/Rahul/Developement/Rahul/personal/uploadedfile/" + fileName));
InputStream is = new FileInputStream(
"/home/Rahul/Developement/Rahul/personal/uploadedfile/" + fileName);
}
I've run into some problems trying to append to an existing text file.
It doesn't seem to append a line text. So far i've got this method:
public static void addLine(File f, String line) {
try {
FileWriter fw = new FileWriter(f.getName(), true);
BufferedWriter buffer = new BufferedWriter(fw);
PrintWriter pw = new PrintWriter(buffer);
pw.println(line);
pw.close();
} catch (IOException e) {
System.err.println("IOException: " + e.getMessage());
}
}
and in my main i've got the following:
public static void main(String[] args) {
File f = new File("adresOfFile");
if (f.exists() && !f.isDirectory()) {
System.out.println("File " + f.getName() + " exists!");
System.out.println("\n" + "Path: " + f.getAbsolutePath());
System.out.println("\n" + "Parent: " + f.getParent());
System.out.println("\n" + "--------------CONTENT OF FILE-------------");
addLine(f, "");
addLine(f, "The line to append");
try {
displayContent(f);
} catch (IOException e) {
System.out.println(e.getMessage());
}
} else {
System.out.println("File not found");
}
}
When I run the program it doesn't seem to give any errors. Running the program should print out the existing text (displayContent), which is done after appending (addLine). But when I run it, it only shows the existing text, without the appended line.
It doesn't show up in the text file either. I tried to put a System.out.println(); in the method, and it prints, so I know its running the method properly, just not appending.
EDIT AWNSER: replaced f.getName() with f, and added pw.flush before pw.close()
I think that your displayContent(File) function has bugs.
The above code does append to the file.
Have a look at the file to see if anything is appended.
Also do you need to create PrintWriter object each time you append a line?
If there are many continuous lines to be appended, try using a single PrintWriter/ BufferedWriter object by creating a static/final object.
I want to search for specific lines of text in a text file. If the piece of text I am looking for is on a specific line, I would like to read further on that line for more input.
So far I have 3 tags I am looking for.
#public
#private
#virtual
If I find any of these on a line, I would like to read what comes next so for example I could have a line like this:
#public double getHeight();
If I determine that the tag I found is #public then I have to take the following part after the white-space until I reach the semicolon. The problem is, that I can't really think of an efficient way to do this without excessive use of charAt(..) which neither looks pretty but probably isn't good either in the long run for a large file, or for multiple files in a row.
I would like help to solve this efficiently as I currently can't comprehend how I would do it. The code itself is used to parse comments in a C++ file, to later generate a Header file. The Pseudo Code part is where I am stuck. Some people suggest BufferedReader, others say Scanner. I went with Scanner as that seems to be the replacement for BufferedReader.
public void run() {
Scanner scanner = null;
String filename, path;
StringBuilder puBuilder, prBuilder, viBuilder;
puBuilder = new StringBuilder();
prBuilder = new StringBuilder();
viBuilder = new StringBuilder();
for(File f : files) {
try {
filename = f.getName();
path = f.getCanonicalPath();
scanner = new Scanner(new FileReader(f));
} catch (FileNotFoundException ex) {
System.out.println("FileNotFoundException: " + ex.getMessage());
} catch (IOException ex) {
System.out.println("IOException: " + ex.getMessage());
}
String line;
while((line = scanner.nextLine()) != null) {
/**
* Pseudo Code
* if #public then
* puBuilder.append(line.substring(after white space)
* + line.substring(until and including the semicolon);
*/
}
}
}
I may be misunderstanding you.. but are you just looking for String.contains()?
if(line.contains("#public")){}
String tag = "";
if(line.startsWith("#public")){
tag = "#public";
}else if{....other tags....}
line = line.substring(tag.length(), line.indexOf(";")).trim();
This gives you a string that goes from the end of the tag (which in this case is public), and then to the character preceding the semi-colon, and then trims off the whitespace on the ends.
if (line.startsWith("#public")) {
...
}
if you are allow to use open source libraries i suggest using the apache common-io and common-lang libraries. these are widely use java librariues that will make you life a lot more simpler.
String text = null;
InputStream in = null;
List<String> lines = null;
for(File f : files) {
try{
in = new FileInputStream(f);
lines = IOUtils.readLines(in);
for (String line: lines){
if (line.contains("#public"){
text = StringUtils.substringBetween("#public", ";");
...
}
}
}
catch (Exception e){
...
}
finally{
// alway remember to close the resource
IOUtils.closeQuietly(in);
}
}
here is the edifile:
ISA*00* 00 *02*HMES *ZZ*MGLYNNCO *120321*1220*U*00401*000015676*0*P*:~GS*FA*HMES*MGLYNNCO*20120321*1220*15691*X*004010~ST*997*000015730~AK1*SM*18292~AK2*204*182920001~AK5*A~AK9*A*1*1*1~SE*6*000015730~GE*1*15691~IEA*1*000015676~
IN JAVA
I have an EDI file that I need to parse through. I can get the file and I have converted it to a string and used tokenizer to break it apart, the issue I am unsure of is that there is another delimiter for each segment how can I break it apart at the segment delimiter?
public class EdiParserP {
public static void main(String[] args) {
//retrieves the file to be read
File file = new File(args[0]);
int ch;
StringBuffer strContent = new StringBuffer("");
FileReader fileInSt = null;
try{
fileInSt = new FileReader(file);
while ((ch = fileInSt.read()) != -1)strContent.append((char)ch);
fileInSt.close();
}
catch (FileNotFoundException e)
{
System.out.println("file" + file.getAbsolutePath()+ "could not be found");
}
catch (IOException ioe) {
System.out.println("problem reading the file" + ioe);
}
System.out.println("File contents:" + "\n" + strContent + "\n");//used to check to see if file is there
System.out.print("Length of file" +" " + strContent.length()+ "\n"+ "\n");//used to count the length of the file
String buffFile = strContent.toString();//used to convert bufferstring to string
//breaks apart the file with the given delimiter
StringTokenizer st = new StringTokenizer(buffFile , "*");
while(st.hasMoreTokens())
{
String s =st.nextToken();
System.out.println(s);
}
}
}
I guess then my second question is how to retrieve the information to put into a database, I do know how to connect to the database, and how to insert, i think, i just am unsure how to pull the data out of this string? thanks for the help
Maybe it won't help you to the end, but using a StringTokenizer class is rather deprecated, you should use simple split() method of String class.
To be honest, I don't know what you really want to do with that file. Do you want to split each string received from StringTokenizer class once again?
I am new to Java and trying to save a multi line string to a text file.
Right now, it does work within my application. Like, if I save the file from my application and then open it from my application, it does put a space between lines. However, if I save the file from my app and then open it in Notepad, it is all on one line.
Is there a way to make it show multi line on all programs? Here's my current code:
public static void saveFile(String contents) {
// Get where the person wants to save the file
JFileChooser fc = new JFileChooser();
int rval = fc.showSaveDialog(fc);
if(rval == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
try {
//File out_file = new File(file);
BufferedWriter out = new BufferedWriter(new FileWriter(file));
out.write(contents);
out.flush();
out.close();
} catch(IOException e) {
messageUtilities.errorMessage("There was an error saving your file. IOException was thrown.", "File Error");
}
}
else {
// Do nothing
System.out.println("The user choose not to save anything");
}
}
depending on how you are constructing your string, you may just be running into a line ending problem. Notepad does not support unix line endings (\n only) it only supports windows line endings (\n\r). try opening your saved file using a more robust editor, and/or make sure you are using the proper line endings for your platform. java's system property (System.getProperty("line.separator")) will get you the proper line ending for the platform that the code is running on.
while you're building your string to be saved to the file, rather than explicitly specifying "\n" or "\n\r" (or on the mac "\r") for your line endings, you would instead append the value of that system property.
like so:
String eol = System.getProperty("line.separator");
... somewhere else in your code ...
String texttosave = "Here is a line of text." + eol;
... more code.. optionally adding lines of text .....
// call your save file method
saveFile(texttosave);
Yea as the previous answer mentions the System.getProperty("line.seperator").
your code doesn't show how you created String contents but since you said you were new to java I thought i'd mention that in java concatenating Strings is not nice since it creates a. If you are building the String by doing this:
String contents = ""
contents = contents + "sometext" + "some more text\n"
Then consider using java.lang.StrinBuilder instead
StringBuilder strBuilder = new StringBuilder();
strBuilder.append("sometext").append("somre more text\n");
...
String contents = strBuilder.toString();
Another alternative is to stream what ever your planning to write to a file rather than building a large string and then outputting that.
You could add something like:
contents = contents.replaceAll("\\n","\\n\\r");
if notepad does not display correctly. However you might run into a different problem: at each save/load you will get multiple \r chars. Then to avoid that at load you would have to call the same code above but with reversed parameters. This is really an ugly solution just to get the text to display properly in notepad.
I had this same problem my guy friend, after much thought and research I even found a solution.
You can use the ArrayList to put all the contents of the TextArea for exemple, and send as parameter by calling the save, as the writer just wrote string lines, then we use the "for" line by line to write our ArrayList in the end we will be content TextArea in txt file.
if something does not make sense, I'm sorry is google translator and I who do not speak English.
Watch the Windows Notepad, it does not always jump lines, and shows all in one line, use Wordpad ok.
private void SaveActionPerformed(java.awt.event.ActionEvent evt) {
String NameFile = Name.getText();
ArrayList< String > Text = new ArrayList< String >();
Text.add(TextArea.getText());
SaveFile(NameFile, Text);
}
public void SaveFile(String name, ArrayList< String> message) {
path = "C:\\Users\\Paulo Brito\\Desktop\\" + name + ".txt";
File file1 = new File(path);
try {
if (!file1.exists()) {
file1.createNewFile();
}
File[] files = file1.listFiles();
FileWriter fw = new FileWriter(file1, true);
BufferedWriter bw = new BufferedWriter(fw);
for (int i = 0; i < message.size(); i++) {
bw.write(message.get(i));
bw.newLine();
}
bw.close();
fw.close();
FileReader fr = new FileReader(file1);
BufferedReader br = new BufferedReader(fr);
fw = new FileWriter(file1, true);
bw = new BufferedWriter(fw);
while (br.ready()) {
String line = br.readLine();
System.out.println(line);
bw.write(line);
bw.newLine();
}
br.close();
fr.close();
} catch (IOException ex) {
ex.printStackTrace();
JOptionPane.showMessageDialog(null, "Error in" + ex);
}