Tokenize Thai sentence with ICUTokenizer JAVA

Tokenize Thai sentence with ICUTokenizer JAVA - java

I am trying the below code to get all the tokens fro the thai sentence.
It throws exception. Can anyone point me to tokenize thai in JAVA?
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.icu.ICUNormalizer2Filter;
import org.apache.lucene.analysis.icu.segmentation.ICUTokenizer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class Tokenizer{
public static void main(String[] args) throws IOException {
ICUTokenizer tokenizer = new ICUTokenizer(new StringReader("การที่ได้ต้องแสดงว่างานดี"));
TokenFilter filter = new ICUNormalizer2Filter(tokenizer);
TokenStreamComponents tt = new TokenStreamComponents(tokenizer, filter);
TokenStream ts = tt.getTokenStream();
CharTermAttribute cattr = ts.addAttribute(CharTermAttribute.class);
ts.reset();
while(ts.incrementToken()){
System.out.println(cattr.toString()+"-----");
}
}
}
Exception is as below
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.lucene.analysis.icu.segmentation.ICUTokenizer.<init>(ICUTokenizer.java:72)
at com.tokenizer.tt.main(tt.java:22)
Caused by: java.lang.RuntimeException: java.io.IOException: ICU data file error: Not an ICU data file
at org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig.readBreakIterator(DefaultICUTokenizerConfig.java:128)
at org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig.<clinit>(DefaultICUTokenizerConfig.java:66)
... 2 more
Caused by: java.io.IOException: ICU data file error: Not an ICU data file
at com.ibm.icu.impl.ICUBinary.readHeader(ICUBinary.java:577)
at com.ibm.icu.text.RBBIDataWrapper.get(RBBIDataWrapper.java:173)
at com.ibm.icu.text.RuleBasedBreakIterator.getInstanceFromCompiledRules(RuleBasedBreakIterator.java:71)
at org.apache.lucene.analysis.icu.segmentation.DefaultICUTokenizerConfig.readBreakIterator(DefaultICUTokenizerConfig.java:123)
... 3 more

Finally figured out how to use ICU4J in a java program
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import org.apache.lucene.analysis.icu.segmentation.ICUTokenizer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
public class icuEstes {
public static void main(String[] args) throws IOException {
Reader reader = new StringReader("การที่ได้ต้องแสดงว่างานดี This is a test ກວ່າດອກ");
ICUTokenizer icut = new ICUTokenizer();
icut.setReader(reader);
icut.addAttribute(CharTermAttribute.class);
icut.reset();
while (icut.incrementToken()) {
System.out.println(icut.toString());
System.out.println(icut.getAttribute(CharTermAttribute.class));
}
icut.close();
}}

Related

BufferUnderflowException while trying to read an Int from a binary file in Java

I am trying to read 4 bytes which represent an int, located at byte position 64 in a binary file.
This is what I have tried:
package testbinaryfile2;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class TestBinaryFile2 {
public static void main(String[] args) throws IOException {
FileChannel fc;
ByteBuffer indexField = ByteBuffer.allocate(4);
fc = (FileChannel.open(Paths.get("myBinaryFile.bin"), StandardOpenOption.READ));
fc.position(64);
fc.read(indexField);
System.out.println(indexField.getInt());
}
}
This is the error I get:
run:
Exception in thread "main" java.nio.BufferUnderflowException
at java.nio.Buffer.nextGetIndex(Buffer.java:509)
at java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:373)
at testbinaryfile2.TestBinaryFile2.main(TestBinaryFile2.java:30)
/home/user/.cache/netbeans/11.3/executor-snippets/run.xml:111: The following error occurred while executing this line:
/home/user/.cache/netbeans/11.3/executor-snippets/run.xml:94: Java returned: 1
BUILD FAILED (total time: 0 seconds)

After read indexField is pointing to its end, so it is necessary to use .rewind() method before getting the int.
This code works:
package testbinaryfile2;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
public class TestBinaryFile2 {
public static void main(String[] args) throws IOException {
FileChannel fc;
ByteBuffer indexField = ByteBuffer.allocate(4);
fc = (FileChannel.open(Paths.get("myBinaryFile.bin"), StandardOpenOption.READ));
fc.position(64);
fc.read(indexField);
indexField.rewind(); // <-- ADD THIS
System.out.println(indexField.getInt());
}
}
Additionally I got a suggestion to use mapped buffer:
public static void main(String[] args) throws Exception {
FileChannel fc = (FileChannel.open(Paths.get("myBinaryFile.bin"), StandardOpenOption.READ));
MappedByteBuffer buffer = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
buffer.position(64);
System.out.println(buffer.getInt());
}
But I am wondering if fc.map reads the whole file at once and uses as much memory as the size of the file.

Getting error while trying to copy a picture in doc file

I am getting below error while trying to copy a pic in a do file through selenium.
This is the error which I am getting -
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at LeadFreeTest.docCapture.main(docCapture.java:17)
Caused by: java.lang.ClassNotFoundException: org.apache.xmlbeans.XmlException
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
Below is code
package LeadFreeTest;
import java.io.*;
import org.apache.poi.openxml4j.exceptions.InvalidFormatException;
import org.apache.poi.xwpf.usermodel.*;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class docCapture {
#SuppressWarnings("resource")
public static void main(String[] args) throws IOException, InvalidFormatException
{
XWPFDocument docx = new XWPFDocument();
XWPFParagraph par = docx.createParagraph();
XWPFRun run = par.createRun();
run.setText("Hello, World. This is my first java generated docx-file. Have fun.");
run.setFontSize(13);
InputStream pic = new FileInputStream("C:\\Naveeen\\TestScreenShot\\LoginPage.png");
//byte [] picbytes = IOUtils.toByteArray(pic);
//run.addPicture(picbytes, Document.PICTURE_TYPE_JPEG);
run.addPicture(pic, Document.PICTURE_TYPE_JPEG, "3", 0, 0);
FileOutputStream out = new FileOutputStream("C:\\Naveeen\\TestScreenShot\\LoginPage.doc");
docx.write(out);
out.close();
pic.close();
}
}

You need to add the XML beans dependency to your classpath hense the
ClassNotFoundException: org.apache.xmlbeans.XmlException
The library is usually called xmlbeans-x.x.x.jar
You can find it here.

EDI Must be a minimum of 1 instances of segment [UNS]

Am newbie to EDI. And i just converted the ORDERS edi file to XML using smooks api. Some of the ORDER example files are working fine in following example. But i got the following exception when i running the following edi file. Am stuck with this. Here is my example and EDI data
package example;
import org.json.JSONObject;
import org.json.XML;
import org.milyn.Smooks;
import org.milyn.SmooksException;
import org.milyn.io.StreamUtils;
import org.milyn.smooks.edi.unedifact.UNEdifactReaderConfigurator;
import org.xml.sax.SAXException;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.StringWriter;
public class Main {
public static int PRETTY_PRINT_INDENT_FACTOR = 4;
protected static String runSmooksTransform() throws IOException, SAXException, SmooksException {
Smooks smooks = new Smooks();
smooks.setReaderConfig(new UNEdifactReaderConfigurator("urn:org.milyn.edi.unedifact:d93a-mapping:*"));
try {
StringWriter writer = new StringWriter();
smooks.filterSource(new StreamSource(new FileInputStream("EDI.edi")), new StreamResult(writer));
return writer.toString();
} finally {
smooks.close();
}
}
public static void main(String[] args) throws IOException, SAXException, SmooksException {
System.out.println("\n\n==============Message In==============");
System.out.println(readInputMessage());
System.out.println("======================================\n");
String messageOut = Main.runSmooksTransform();
System.out.println("==============Message Out=============");
System.out.println(messageOut);
System.out.println("======================================\n\n");
JSONObject xmlJSONObj = XML.toJSONObject(messageOut);
String jsonPrettyPrintString = xmlJSONObj.toString(PRETTY_PRINT_INDENT_FACTOR);
System.out.println(jsonPrettyPrintString);
}
private static String readInputMessage() throws IOException {
return StreamUtils.readStreamAsString(new FileInputStream("EDI.edi"));
}
}
And the exception with Sample EDI Data
Exception in thread "main" org.milyn.SmooksException: Failed to filter source.
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:97)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:64)
at org.milyn.Smooks._filter(Smooks.java:526)
at org.milyn.Smooks.filterSource(Smooks.java:482)
at org.milyn.Smooks.filterSource(Smooks.java:456)
at example.Main.runSmooksTransform(Main.java:49)
at example.Main.main(Main.java:63)
Caused by: org.milyn.edisax.EDIParseException: EDI message processing failed [ORDERS][D:93A:UN]. Must be a minimum of 1 instances of segment [UNS]. Currently at segment number 9.
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:499)
at org.milyn.edisax.EDIParser.mapSegments(EDIParser.java:450)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:426)
at org.milyn.edisax.EDIParser.parse(EDIParser.java:410)
at org.milyn.edisax.unedifact.handlers.UNHHandler.process(UNHHandler.java:97)
at org.milyn.edisax.unedifact.handlers.UNBHandler.process(UNBHandler.java:75)
at org.milyn.edisax.unedifact.UNEdifactInterchangeParser.parse(UNEdifactInterchangeParser.java:113)
at org.milyn.smooks.edi.unedifact.UNEdifactReader.parse(UNEdifactReader.java:75)
at org.milyn.delivery.sax.SAXParser.parse(SAXParser.java:76)
at org.milyn.delivery.sax.SmooksSAXFilter.doFilter(SmooksSAXFilter.java:86)
... 6 more

Bad source data will cause this.
It looks like smooks is looking for a UNS segment which isn't in your data. The section control is mandatory per the D.93A standard.

Error while reading .h5 image using HDF5Form class

Here is my Program and,
package myapp;
import java.rmi.RemoteException;
import visad.DataImpl;
import visad.VisADException;
import visad.data.hdf5.HDF5Form;
import ncsa.hdf.hdf5lib.*;
public final class hfAdapter2
{
String filePath;
String path;
public hfAdapter2() throws VisADException, RemoteException
{
filePath="F:\\Devanshi\\Input\\xyz.h5";
System.load("F:\\Devanshi\\Projects\\MyApp\\Lib\\jhdf.dll");
System.load("F:\\Devanshi\\Projects\\MyApp\\Lib\\jhdf5.dll");
HDF5Form h=new HDF5Form();
DataImpl ncData=h.open(filePath);
}
public static void main(String[] args)
throws RemoteException, VisADException, IOException
{
new hfAdapter2();
}
}
The exception I got is :
Exception in thread "main" java.lang.UnsatisfiedLinkError:
ncsa.hdf.hdf5lib.H5.H5Fopen(Ljava/lang/String;II)I
at ncsa.hdf.hdf5lib.H5.H5Fopen(Native Method)
at visad.data.hdf5.hdf5objects.HDF5File.<init>(HDF5File.java:85)
at visad.data.hdf5.HDF5FileAdapted.<init>(HDF5FileAdapted.java:70)
at visad.data.hdf5.HDF5Form.open(HDF5Form.java:102)
at myapp.hfAdapter2.<init>(hfAdapter2.java:46)
at myapp.hfAdapter2.main(hfAdapter2.java:96)
I have tried to solve this by
-Dncsa.hdf.hdf5lib.H5.hdf5lib=F:\DLL\hdf5_ij_plugin\lib\win32
-Djava.library.path=F:\DLL\hdf5_ij_plugin\lib\win32
and also this:
-Dncsa.hdf.hdf5lib.H5.hdf5lib=F:\DLL\hdf5_ij_plugin\lib\win32\jhdf5.lib
-Djava.library.path=F:\DLL\hdf5_ij_plugin\lib\win32\jhdf5.lib

Error in write ArrayList String to File

I am trying to write an arraylist string list to a file. The arraylist string is actually a string converted from twitter JSON and I am trying to write the tweet text into the file.
However, I keep getting this error:
Exception in thread "main" java.lang.NullPointerException
at java.io.Writer.write(Unknown Source)
at kr.ac.uos.datamining.test.main(test.java:32)
The code for the whole class are below:
package kr.ac.uos.datamining;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.*;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.List;
import kr.ac.uos.datamining.JSONParser;
import kr.ac.uos.datamining.Tweet;
import kr.ac.uos.datamining.User;
public class test {
public static List <String> list = new ArrayList<String>();
public static void main(String[] args) throws IOException, FileNotFoundException, InterruptedException, SQLException {
JSONParser j = new JSONParser(new File("D:/curl-7.32.0/samsunggalaxy-01-23-2014.txt"));
ArrayList<Tweet> tweets = j.getTweets();
for(Tweet tweet : tweets){
list.add(tweet.getText());
}
FileWriter writer = new FileWriter("D:/samsunggalaxy.txt");
for (String tweet: list) {
Line 32 writer.write(tweet);
}
writer.close();
}
}
Since it is said as Unknown Source, is it the problem with the String tweet: list line?
I tried to change it to String str: list but its not working as well

The only way that you got a NullPoineterException is that your text is null, so validate what you want to write before writing it.
for (String tweet: list) {
if(tweet != null || !tweet.equals("")) {
writer.write(tweet);
}
}

Seems one of your tweet objects is null. This is the problem here.

It seems the String tweet is null, so I'd check your Tweet.getText() method to ensure it never returns null.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tokenize Thai sentence with ICUTokenizer JAVA - java

Related

BufferUnderflowException while trying to read an Int from a binary file in Java

Getting error while trying to copy a picture in doc file

EDI Must be a minimum of 1 instances of segment [UNS]

Error while reading .h5 image using HDF5Form class

Error in write ArrayList String to File

Categories

Resources