I have hastable
htmlcontent is html string of urlstring .
I want to write hastable into a .text file .
Can anyone suggest a solution?
How about one row for each entry, and two strings separated by a comma? Sort of like:
"key1","value1"
"key2","value2"
...
"keyn","valuen"
keep the quotes and you can write out keys that refer to null entries too, like
"key", null
To actually produce the table, you might want to use code similar to:
public void write(OutputStreamWriter out, HashTable<String, String> table)
throws IOException {
String eol = System.getProperty("line.separator");
for (String key: table.keySet()) {
out.write("\"");
out.write(key);
out.write("\",\"");
out.write(String.valueOf(table.get(key)));
out.write("\"");
out.write(eol);
}
out.flush();
}
For the I/O part, you can use a new PrintWriter(new File(filename)). Just call the println methods like you would System.out, and don't forget to close() it afterward. Make sure you handle any IOException gracefully.
If you have a specific format, you'd have to explain it, but otherwise a simple for-each loop on the Hashtable.entrySet() is all you need to iterate through the entries of the Hashtable.
By the way, if you don't need the synchronized feature, a HashMap<String,String> would probably be better than a Hashtable.
Related questions
Java io ugly try-finally block
Java hashmap vs hashtable
Iterate Over Map
Here's a simple example of putting things together, but omitting a robust IOException handling for clarity, and using a simple format:
import java.io.*;
import java.util.*;
public class HashMapText {
public static void main(String[] args) throws IOException {
//PrintWriter out = new PrintWriter(System.out);
PrintWriter out = new PrintWriter(new File("map.txt"));
Map<String,String> map = new HashMap<String,String>();
map.put("1111", "One");
map.put("2222", "Two");
map.put(null, null);
for (Map.Entry<String,String> entry : map.entrySet()) {
out.println(entry.getKey() + "\t=>\t" + entry.getValue());
}
out.close();
}
}
Running this on my machine generates a map.txt containing three lines:
null => null
2222 => Two
1111 => One
As a bonus, you can use the first declaration and initialization of out, and print the same to standard output instead of a text file.
See also
Difference between java.io.PrintWriter and java.io.BufferedWriter?
java.io.PrintWriter API
Methods in this class never throw I/O exceptions, although some of its constructors may. The client may inquire as to whether any errors have occurred by invoking checkError().
For text representation, I would recommend picking a few characters that are very unlikely to occur in your strings, then outputting a CSV format file with those characters as separators, quotes, terminators, and escapes. Essentially, each row (as designated by the terminator, since otherwise there might be line-ending characters in either string) would have as the first CSV "field" the key of an entry in the hashtable, as the second field, the value for it.
A simpler approach along the same lines would be to designate one arbitrary character, say the backslash \, as the escape character. You'll have to double up backslashes when they occur in either string, and express in escape-form the tab (\t) and line-end ('\n); then you can use a real (not escape-sequence) tab character as the field separator between the two fields (key and value), and a real (not escape-sequence) line-end at the end of each row.
You can try
public static void save(String filename, Map<String, String> hashtable) throws IOException {
Properties prop = new Properties();
prop.putAll(hashtable);
FileOutputStream fos = new FileOutputStream(filename);
try {
prop.store(fos, prop);
} finally {
fos.close();
}
}
This stores the hashtable (or any Map) as a properties file. You can use the Properties class to load the data back in again.
import java.io.*;
class FileWrite
{
public static void main(String args[])
{
HashTable table = //get the table
try{
// Create file
BufferedWriter writer = new BufferedWriter(new FileWriter("out.txt"));
writer.write(table.toString());
}catch (Exception e){
e.printStackTrace();
}finally{
out.close();
}
}
}
Since you don't have any requirements to the file format, I would not create a custom one. Just use something standard. I would recommend use json for that!
Alternatives include xml and csv but I think that json is the best option here. Csv doesn't handle complex types like having a list in one of the keys of your map and xml can be quite complex to encode/decode.
Using json-simple as example:
String serialized = JSONValue.toJSONString(yourMap);
and then just save the string to your file (what is not specific of your domain either using Apache Commons IO):
FileUtils.writeStringToFile(new File(yourFilePath), serialized);
To read the file:
Map map = (JSONObject) JSONValue.parse(FileUtils.readFileToString(new File(yourFilePath));
You can use other json library as well but I think this one fits your need.
Related
I have notice a behavior for output stream class (buffered output stream) and I want to understand it to resolve my issue , that when I create an object to write text data in file it is OK but when try to write again with another object of same class it work fine but replace previous text with new one
class writeFile extneds BufferedOutputStream{
public static void main(String arg[]) throws FileNotFoundException, IOException
{
new writeFile(new FileOutputStream(file)).setComments("hello");
new writeFile(new FileOutputStream(file)).setComments("Hi");
}
public void setComments(String s) throws IOException
{
this.write(this.setBytes(s+"\r\n\n"));
this.write(this.setBytes("-----------------------------------\r\n\n"));
}
when execute it I find just Hi word and the first word is not there because it replaced with last one so why when I use another object to write some text it write from beginning and replace with before it and is there any solution because when I close the program and open it again it will be new declaration for object and this considered as new object
There is a FileOutputStream(String, boolean) constructor, where the second parameter is to append. Easiest fix I see, change
new writeFile(new FileOutputStream(file)).setComments("Hi");
to
new writeFile(new FileOutputStream(file, true)).setComments("Hi");
Personally, I think, it would be better to use one OutputStream (and your writeFile is one possible such class). And you should always close your resources (you could use a try-with-resources). Finally, Java naming conventions have classes start with a capital letter - writeFile looks like a method name.
try (writeFile fos = new writeFile(new FileOutputStream(file))) {
fos.setComments("hello");
fos.setComments("Hi");
}
What are the advantages of using NullWritable for null keys/values over using null texts (i.e. new Text(null)). I see the following from the «Hadoop: The Definitive Guide» book.
NullWritable is a special type of Writable, as it has a zero-length serialization. No bytes
are written to, or read from, the stream. It is used as a placeholder; for example, in
MapReduce, a key or a value can be declared as a NullWritable when you don’t need
to use that position—it effectively stores a constant empty value. NullWritable can also
be useful as a key in SequenceFile when you want to store a list of values, as opposed
to key-value pairs. It is an immutable singleton: the instance can be retrieved by calling
NullWritable.get()
I do not clearly understand how the output is written out using NullWritable? Will there be a single constant value in the beginning output file indicating that the keys or values of this file are null, so that the MapReduce framework can ignore reading the null keys/values (whichever is null)? Also, how actually are null texts serialized?
Thanks,
Venkat
The key/value types must be given at runtime, so anything writing or reading NullWritables will know ahead of time that it will be dealing with that type; there is no marker or anything in the file. And technically the NullWritables are "read", it's just that "reading" a NullWritable is actually a no-op. You can see for yourself that there's nothing at all written or read:
NullWritable nw = NullWritable.get();
ByteArrayOutputStream out = new ByteArrayOutputStream();
nw.write(new DataOutputStream(out));
System.out.println(Arrays.toString(out.toByteArray())); // prints "[]"
ByteArrayInputStream in = new ByteArrayInputStream(new byte[0]);
nw.readFields(new DataInputStream(in)); // works just fine
And as for your question about new Text(null), again, you can try it out:
Text text = new Text((String)null);
ByteArrayOutputStream out = new ByteArrayOutputStream();
text.write(new DataOutputStream(out)); // throws NullPointerException
System.out.println(Arrays.toString(out.toByteArray()));
Text will not work at all with a null String.
I change the run method. and success
#Override
public int run(String[] strings) throws Exception {
Configuration config = HBaseConfiguration.create();
//set job name
Job job = new Job(config, "Import from file ");
job.setJarByClass(LogRun.class);
//set map class
job.setMapperClass(LogMapper.class);
//set output format and output table name
//job.setOutputFormatClass(TableOutputFormat.class);
//job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "crm_data");
//job.setOutputKeyClass(ImmutableBytesWritable.class);
//job.setOutputValueClass(Put.class);
TableMapReduceUtil.initTableReducerJob("crm_data", null, job);
job.setNumReduceTasks(0);
TableMapReduceUtil.addDependencyJars(job);
FileInputFormat.addInputPath(job, new Path(strings[0]));
int ret = job.waitForCompletion(true) ? 0 : 1;
return ret;
}
You can always wrap your string in your own Writable class and have a boolean indicating it has blank strings or not:
#Override
public void readFields(DataInput in) throws IOException {
...
boolean hasWord = in.readBoolean();
if( hasWord ) {
word = in.readUTF();
}
...
}
and
#Override
public void write(DataOutput out) throws IOException {
...
boolean hasWord = StringUtils.isNotBlank(word);
out.writeBoolean(hasWord);
if(hasWord) {
out.writeUTF(word);
}
...
}
I am writing an algorithm to extract likely keywords from a document's text. I want to count instances of words and take the top 5 as keywords. Obviously, I want to exclude "insignificant" words lest every document appears with "the" and "and" as major keywords.
Here is the strategy I've successfully used for testing:
exclusions = new ArrayList<String>();
exclusions.add("a","and","the","or");
Now that I want to do a real-life test, my exclusion list is close to 200 words long, and I'd LOVE to be able to do something like this:
exclusions = new ArrayList<String>();
exclusions.add(each word in foo.txt);
Long term, maintaining an external list (rather than a list embedded in my code) is desirable for obvious reasons. With all the file read/write methods out there in Java, I'm fairly certain that this can be done, but my search results have come up empty...I know I've got to be searching on the wrong keywords. Anyone know an elegant way to include an external list in processing?
This does not immediately address the solution you are prescribing but might give you another avenue that might be better.
Instead of deciding in advance what is useless, you could count everything and then filter out what you deem is insignificant (from a information carrying standpoint) because of its overwhelming presence. It is similar to a low-pass filter in signal processing to eliminate noise.
So in short, count everything. Then decide that if something appears with a frequency higher than a threshold you set (you'll have to determine what that threshold is from experiment, say 5% of all words are 'the', that means it does not carry information).
If you do it this way, it'll even work for foreign languages.
Just my two cents on this.
You can use a FileReader to read the Strings out of the file and add them to an ArrayList.
private List<String> createExculsions(String file) throws IOException {
BufferedReader reader = new BufferedReader(new FileReader(file));
String word = null;
List<String> exclusions = new ArrayList<String>();
while((word = reader.readLine()) != null) {
exclusions.add(word);
}
return exclusions;
}
Then you can use List<String> exclusions = createExclusions("exclusions.txt"); to create the list.
Not sure if it is elegant but here I created a simple solution to detect the language or remove noise words from tweets some years ago:
TweetDetector.java
JTweet.java which is using the data like for english
Google Guava library contains lots of useful methods that simplify routine tasks. You can use one of them to read file contents to string and split it by space character:
String contents = Files.toString(new File("foo.txt"), Charset.defaultCharset());
List<String> exclusions = Lists.newArrayList(contents.split("\\s"));
Apache Commons IO provides similar shortcuts:
String contents = FileUtils.readFileToString(new File("foo.txt"));
...
Commons-io has utilities that support this. Include commons-io as a dependency, then issue
File myFile = ...;
List<String> exclusions = FileUtils.readLines( myFile );
as described in:
http://commons.apache.org/io/apidocs/org/apache/commons/io/FileUtils.html
This assumes that every exclusion word is on a new line.
Reading from a file is pretty simple.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.HashSet;
public class ExcludeExample {
public static HashSet<String> readExclusions(File file) throws IOException{
BufferedReader br = new BufferedReader(new FileReader(file));
String line = "";
HashSet<String> exclusions = new HashSet<String>();
while ((line = br.readLine()) != null) {
exclusions.add(line);
}
br.close();
return exclusions;
}
public static void main(String[] args) throws IOException{
File foo = new File("foo.txt");
HashSet<String> exclusions = readExclusions(foo);
System.out.println(exclusions.contains("the"));
System.out.println(exclusions.contains("Java"));
}
}
foo.txt
the
a
and
or
I used a HashSet instead of a ArrayList because it has faster lookup.
I need to be able to re-use a java.io.InputStream multiple times, and I figured the following code would work, but it only works the first time.
Code
public class Clazz
{
private java.io.InputStream dbInputStream, firstDBInputStream;
private ArrayTable db;
public Clazz(java.io.InputStream defDB)
{
this.firstDBInputStream = defDB;
this.dbInputStream = defDB;
if (db == null)
throw new java.io.FileNotFoundException("Could not find the database at " + db);
if (dbInputStream.markSupported())
dbInputStream.mark(Integer.MAX_VALUE);
loadDatabaseToArrayTable();
}
public final void loadDatabaseToArrayTable() throws java.io.IOException
{
this.dbInputStream = firstDBInputStream;
if (dbInputStream.markSupported())
dbInputStream.reset();
java.util.Scanner fileScanner = new java.util.Scanner(dbInputStream);
String CSV = "";
for (int i = 0; fileScanner.hasNextLine(); i++)
CSV += fileScanner.nextLine() + "\n";
db = ArrayTable.createArrayTableFromCSV(CSV);
}
public void reloadDatabase()//A method called by the UI
{
try
{
loadDatabaseToArrayTable();
}
catch (Throwable t)
{
//Alert the user that an error has occurred
}
}
}
Note that ArrayTable is a class of mine, which uses arrays to give an interface for working with tables.
Question
In this program, the database is shown directly to the user immediately after the reloadDatabase() method is called, and so any solution involving saving the initial read to an object in memory is useless, as that will NOT refresh the data (think of it like a browser; when you press "Refresh", you want it to fetch the information again, not just display the information it fetched the first time). How can I read a java.io.InputStream more than once?
You can't necessarily read an InputStream more than once. Some implementations support it, some don't. What you are doing is checking the markSupported method, which is indeed an indicator if you can read the same stream twice, but then you are ignoring the result. You have to call that method to see if you can read the stream twice, and if you can't, make other arrangements.
Edit (in response to comment): When I wrote my answer, my "other arrangements" was to get a fresh InputStream. However, when I read in your comments to your question about what you want to do, I'm not sure it is possible. For the basics of the operation, you probably want RandomAccessFile (at least that would be my first guess, and if it worked, that would be the easiest) - however you will have file access issues. You have an application actively writing to a file, and another reading that file, you will have problems - exactly which problems will depend on the OS, so whatever solution would require more testing. I suggest a separate question on SO that hits on that point, and someone who has tried that out can perhaps give you more insight.
you never mark the stream to be reset
public Clazz(java.io.InputStream defDB)
{
firstDBInputStream = defDB.markSupported()?defDB:new BufferedInputStream(defDB);
//BufferedInputStream supports marking
firstDBInputStream.mark(500000);//avoid IOException on first reset
}
public final void loadDatabaseToArrayTable() throws java.io.IOException
{
this.dbInputStream = firstDBInputStream;
dbInputStream.reset();
dbInputStream.mark(500000);//or however long the data is
java.util.Scanner fileScanner = new java.util.Scanner(dbInputStream);
StringBuilder CSV = "";//StringBuilder is more efficient in a loop
while(fileScanner.hasNextLine())
CSV.append(fileScanner.nextLine()).append("\n");
db = ArrayTable.createArrayTableFromCSV(CSV.toString());
}
however you could instead keep a copy of the original ArrayTable and copy that when you need to (or even the created string to rebuild it)
this code creates the string and caches it so you can safely discard the inputstreams and just use readCSV to build the ArrayTable
private String readCSV=null;
public final void loadDatabaseToArrayTable() throws java.io.IOException
{
if(readCSV==null){
this.dbInputStream = firstDBInputStream;
java.util.Scanner fileScanner = new java.util.Scanner(dbInputStream);
StringBuilder CSV = "";//StringBuilder is more efficient in a loop
while(fileScanner.hasNextLine())
CSV.append(fileScanner.nextLine()).append("\n");
readCSV=CSV.toString();
fileScanner.close();
}
db = ArrayTable.createArrayTableFromCSV(readCSV);
}
however if you want new information you'll need to create a new stream to read from again
I'm calling a method from an external library with a (simplified) signature like this:
public class Alien
{
// ...
public void munge(Reader in, Writer out) { ... }
}
The method basically reads a String from one stream and writes its results to the other. I have several strings which I need processed by this method, but none of them exist in the file system. The strings can get quite long (ca 300KB each). Ideally, I would like to call munge() as a filter:
public void myMethod (ArrayList<String> strings)
{
for (String s : strings) {
String result = alienObj.mungeString(s);
// do something with result
}
}
Unfortunately, the Alien class doesn't provide a mungeString() method, and wasn't designed to be inherited from. Is there a way I can avoid creating two temporary files every time I need to process a list of strings? Like, pipe my input to the Reader stream and read it back from the Writer stream, without actually touching the file system?
I'm new to Java, please forgive me if the answer is obvious to professionals.
You can easily avoid temporary files by using any/all of these:
CharArrayReader / CharArrayWriter
StringReader / StringWriter
PipedReader / PipedWriter
A sample mungeString() method could look like this:
public String mungeString(String input) {
StringWriter writer = new StringWriter();
alienObj.munge(new StringReader(input), writer));
return writer.toString();
}
StringReader
StringWriter
If you are welling to work with binary arrays in-memory like you do in C# then I think the PipedWriter & PipedReader are the most convenient way to do so. Check this:
Is it possible to avoid temp files when a Java method expects Reader/Writer arguments?