Variable Append Java - java

I'm using Java to code Data Mining application.
To do so I'm reading a "arff" file and run it over a model created in WEKA.
Currently I have a ARFF file with one line as the data to process and its working good.
What i'm trying to achieve is: that user input from jCheckbox or something some information, and i'll use to to run on the model. I've taught of two ways to do it.
1. Read a file without the last line, and only append it directly to the variable.
2. Delete line number (same place always) from the ARFF, build a line from user inputs, write it to ARFF, and then read the ARFF again.
Any suggestion on which (I think 1 is better) and in general how ?
Tried some code with StringBuilder but no success.

Related

Java CSV file update

I would like to update a .csv file containing 1 million records. I want to update in the same file without creating another file. I am using Opencsv. I don't find any way to update the .csv file. If anyone can help me out that would be great.
For clarification let's say these small .csv file:
Initial csv file:
a,b,c,d
1,2,3,4
5,6,7,8
12,13,14,15
Desired csv file:
a,b,c,d,e,f
1,2,3,4, ,17
5,6,7,8,16,
12,13,14,15,
Basically - you cannot do that.
As csv does not use fixed-length records, most changes will require moving data up or down in the file. That cannot be done without completely rewriting the file.
1,2,3,4
5,6,7,8
...
changing the 8 to a 10 would require every byte from that location onwards in the whole file to be moved up one position.
To achieve the effect of editing the file it is usual to copy the file to a new name making the change you wish to make during the copy. You would then rename the two files so that the new one replaces the old one.
Normally file system doesnt allow to update the file. You need to rewrite the file. Because even though you have all in one file . it will be written in multiple segments. So if you are going to change the file segments will be damaged. So you need to rewrite every thing.

Extract flat files contents into individual words and store into database

I've done a lot of internet searching to find some information to no avail.. Hopefully you can help me..
I want to be able to use a flat file, with normal content (i.e. full english sentences, paragraphs etc), extract each word and store each word individually, one word per row, in a SQL database (doesn't matter if there are spaces but characters such as apostrophes can be kept in)
I then want to have a HTML page with code to access this DB and output the text to the user one word at a time, essentially 'writing' the inputted files text word-by-word on the web page.
This is just a coding exercise but I am frustrated as I know the what but not the how.. I am not sure where to start. Please note some of these files can be quite big ~ 20,000 words so there may be a performance element to consider to any solution.
TL;DR: I want to extract individual words from a text file with normal everyday sentences into a SQL DB that I can retrieve from a HTML page.
Simple read & split exercise
with open(<filename>) as f:
dd = {}
for ln in f:
wds = ln.strip().split()
for word in wds:
dd[word] = 1 # need something for value
for wkey in dd:
<insert into db>
Well, before you start you should choose just one programming language. Since you seem like you are a beginner I would highly recommend Python over Java, but it depends on if you're required to use any particular language by an employer/professor/etc.
Also just to point out, this is also a very BIG task that you've chosen. I'll try to break it down into parts for you, but I recommend starting with just one of these parts before you move on, and make sure it works on your local machine before you try putting it on the web.
First you need to use something read in your file, preferably line by line. A method similar to FileReader/BufferedReader in Java or the open(), readlines() functions in Python will do these. I would also check out the tutorials online on file handling for whichever of these two languages you're going to use. The Python one is here. Practice this with a test file or a small section of your real file before you start working on your real input files.
When you start processing the lines from the file, I would recommend splitting them into individual words using a string split function on spaces or on any punctuation, such as ,.!?". This way you'll pull out the individual words from the each line in the file.
Next, you'll want to choose a database API for the appropriate programming language. I used PyMySQL but there is also MySQLDB for Python. In Java there is JDBC.
You'll need to then build your database on a server somewhere, preferably on the same server as your HTML page for ease of connection. You'll want to practice connecting to your database and adding sample rows before you start trying to process your real input files.
You can't have normal HTML access the database directly - you'll need to use a coding language like Python for that. I've never used Java for webpages, but with Python you'll simply output text and tell the server to display it as the webpage. This will do the trick:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import otherstuffhere
## Must have this header to tell browser how to handle this output
## and must be printed first
print ("Content-Type: text/html\n\n")
## Connect to database here
## Your code to display words from the database goes below here
print (myfield1)
Also remember that when you output your text, you'll need to add all the HTML tags to the normal text output. For example, when printing each word, you'll need to add <p> or <br> to end each line, because although the Python print() function will automatically add a line break, this doesn't translate to a line break in HTML. For example:
print ("My word list is: <br>")
for word in dbOutputList:
print (word)
print ("<br>")
After that the REAL fun/crying begins, but you should work on the above before you move on.

Need help using java threads to download file parts

I am trying to download a file from a server in a user specified number of parts (n). So there is a file of x bytes divided into n parts with each part downloading a piece of the whole file at the same time. I am using threads to implement this, but I have not worked with http before and do not really understand how downloading a file really works. I have read up on it and it seems "Range" needs to be used, but I do not know how to download different parts and being able to append them without corrupting the data.
(Since it's a homework assignment I will only give you a hint)
Appending to a single file will not help you at all, since this will mess up the data. You have two alternatives:
Download from each thread to a separate temporary file and then merge the temporary files in the right order to create the final file. This is probably easier to conceive, but a rather ugly and inefficient approach.
Do not stick to the usual stream-style semantics - use random access (1, 2) to write data from each thread straight to the right location within the output file.

reading a file while it's being written

I've read some posts on stackoverflow about this topic but I'm still confused. When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
EDIT: sorry, I should have mentioned that the file writing it is in C++ and the one reading it is in Java so variables can't really be shared easily
When reading a file that is currently being written in Java, how do you keep track of how many lines have actually been written so that you don't get weird read results?
The problem is that you can never be sure that the current last character of the file is the end of a line. If it is a line terminator, you are OK. If BufferedReader.readLine() will interpret it as a complete line without a line terminator ... and weird results will ensue.
What you need to do is to implement your own line buffering. When you get an EOF you wait until the file grows some more and then resume reading the line.
Alternatively, if you are using Java 7 or later, the file watcher APIs allow you to watch for file writes without polling the file's size.
By the way, there is an Apache commons class that is designed for doing this kind of thing:
http://commons.apache.org/io/api-2.0/org/apache/commons/io/input/Tailer.html
If I understand, the file is being written in C# in some process and another Java process wants to read it while it is being written.
Look at File Monitoring section on the tail command here. But I want to warn you that when I used the cygwin tail on Windows recently to follow log files that were rolling over, it sometimes failed under heavy load. Other implementations may be more robust.
To have a count of the number of lines, just keep a counter on the side that's doing the writing.
So, every time you write a line, increment a counter, and make that counter readable via a method, something like, public int getNumLinesWritten()
The obvious answer to me... Why not use a buffer? Use a string or whatever you need. (You could use a list/array of strings if you want, one for each line maybe?) Append to the string just as you would write to the file, then instead of reading from the file, read from that string. Would that work for you?

delete single line in a text file?

I have created j2me application for read write of text file
now at time of reading I read one line and send it to server. after that I want to remove that line from text file.
I am not getting how to do it. in some example I found solution as copy original file content in one object then remove that string from object and then delete original file and create new with that new object.
I don't think it as good approach. is there any other way to do so???
Edit:
actually problem is like one application is writing some data in text file and my another application read one line send to server and remove that line.
Now if I go with the approach like copy new object and delete file and write new file with new object then I will found one problem
if file is deleted then first application can't found that file so it may create new file
with only one data and second application will create new file based on new object
so my data will be lost
Edit:
Even I tried to do same thing with RMS but when both application is accessing same RMS at that time all data in RMS file are clear. First application open RMS for writing and second Open for sync and delete. but at time when both are opening RMS all data clear.
Is it possible to set lock on RMS file from one application??
No, that's how you do it.
You can't delete a line from the beginning of a file. You would need to re-write the file without that line.
(Note that this is not specific to java)
As records are inserted i was creating single file for single record in one specific folder
now as that file is read by background application and send to server that will be deleted by application.
so it solve concurrency problem in file read write.
i know it is not good approach but i didn't find any other good approach.
Most file systems don't have a mechanism for deleting stuff in the middle. (pretty sure that's the case in j2me). So a standard practice is open a new file; copy the old file up to the point where the unwanted line goes, skip it, then copy the rest of the file. I know it sounds inelegant but that's just how it is :)

Categories

Resources