Query Windows Search from Java

Query Windows Search from Java - java

I would like to get to query Windows Vista Search service directly ( or indirectly ) from Java.
I know it is possible to query using the search-ms: protocol, but I would like to consume the result within the app.
I have found good information in the Windows Search API but none related to Java.
I would mark as accepted the answer that provides useful and definitive information on how to achieve this.
Thanks in advance.
EDIT
Does anyone have a JACOB sample, before I can mark this as accepted?
:)

You may want to look at one of the Java-COM integration technologies. I have personally worked with JACOB (JAva COm Bridge):
http://danadler.com/jacob/
Which was rather cumbersome (think working exclusively with reflection), but got the job done for me (quick proof of concept, accessing MapPoint from within Java).
The only other such technology I'm aware of is Jawin, but I don't have any personal experience with it:
http://jawinproject.sourceforge.net/
Update 04/26/2009:
Just for the heck of it, I did more research into Microsoft Windows Search, and found an easy way to integrate with it using OLE DB. Here's some code I wrote as a proof of concept:
public static void main(String[] args) {
DispatchPtr connection = null;
DispatchPtr results = null;
try {
Ole32.CoInitialize();
connection = new DispatchPtr("ADODB.Connection");
connection.invoke("Open",
"Provider=Search.CollatorDSO;" +
"Extended Properties='Application=Windows';");
results = (DispatchPtr)connection.invoke("Execute",
"select System.Title, System.Comment, System.ItemName, System.ItemUrl, System.FileExtension, System.ItemDate, System.MimeType " +
"from SystemIndex " +
"where contains('Foo')");
int count = 0;
while(!((Boolean)results.get("EOF")).booleanValue()) {
++ count;
DispatchPtr fields = (DispatchPtr)results.get("Fields");
int numFields = ((Integer)fields.get("Count")).intValue();
for (int i = 0; i < numFields; ++ i) {
DispatchPtr item =
(DispatchPtr)fields.get("Item", new Integer(i));
System.out.println(
item.get("Name") + ": " + item.get("Value"));
}
System.out.println();
results.invoke("MoveNext");
}
System.out.println("\nCount:" + count);
} catch (COMException e) {
e.printStackTrace();
} finally {
try {
results.invoke("Close");
} catch (COMException e) {
e.printStackTrace();
}
try {
connection.invoke("Close");
} catch (COMException e) {
e.printStackTrace();
}
try {
Ole32.CoUninitialize();
} catch (COMException e) {
e.printStackTrace();
}
}
}
To compile this, you'll need to make sure that the JAWIN JAR is in your classpath, and that jawin.dll is in your path (or java.library.path system property). This code simply opens an ADO connection to the local Windows Desktop Search index, queries for documents with the keyword "Foo," and print out a few key properties on the resultant documents.
Let me know if you have any questions, or need me to clarify anything.
Update 04/27/2009:
I tried implementing the same thing in JACOB as well, and will be doing some benchmarks to compare performance differences between the two. I may be doing something wrong in JACOB, but it seems to consistently be using 10x more memory. I'll be working on a jcom and com4j implementation as well, if I have some time, and try to figure out some quirks that I believe are due to the lack of thread safety somewhere. I may even try a JNI based solution. I expect to be done with everything in 6-8 weeks.
Update 04/28/2009:
This is just an update for those who've been following and curious. Turns out there are no threading issues, I just needed to explicitly close my database resources, since the OLE DB connections are presumably pooled at the OS level (I probably should have closed the connections anyway...). I don't think I'll be any further updates to this. Let me know if anyone runs into any problems with this.
Update 05/01/2009:
Added JACOB example per Oscar's request. This goes through the exact same sequence of calls from a COM perspective, just using JACOB. While it's true JACOB has been much more actively worked on in recent times, I also notice that it's quite a memory hog (uses 10x as much memory as the Jawin version)
public static void main(String[] args) {
Dispatch connection = null;
Dispatch results = null;
try {
connection = new Dispatch("ADODB.Connection");
Dispatch.call(connection, "Open",
"Provider=Search.CollatorDSO;Extended Properties='Application=Windows';");
results = Dispatch.call(connection, "Execute",
"select System.Title, System.Comment, System.ItemName, System.ItemUrl, System.FileExtension, System.ItemDate, System.MimeType " +
"from SystemIndex " +
"where contains('Foo')").toDispatch();
int count = 0;
while(!Dispatch.get(results, "EOF").getBoolean()) {
++ count;
Dispatch fields = Dispatch.get(results, "Fields").toDispatch();
int numFields = Dispatch.get(fields, "Count").getInt();
for (int i = 0; i < numFields; ++ i) {
Dispatch item =
Dispatch.call(fields, "Item", new Integer(i)).
toDispatch();
System.out.println(
Dispatch.get(item, "Name") + ": " +
Dispatch.get(item, "Value"));
}
System.out.println();
Dispatch.call(results, "MoveNext");
}
} finally {
try {
Dispatch.call(results, "Close");
} catch (JacobException e) {
e.printStackTrace();
}
try {
Dispatch.call(connection, "Close");
} catch (JacobException e) {
e.printStackTrace();
}
}
}

As few posts here suggest you can bridge between Java and .NET or COM using commercial or free frameworks like JACOB, JNBridge, J-Integra etc..
Actually I had an experience with with one of these third parties (an expensive one :-) ) and I must say I will do my best to avoid repeating this mistake in the future. The reason is that it involves many "voodoo" stuff you can't really debug, it's very complicated to understand what is the problem when things go wrong.
The solution I would suggest you to implement is to create a simple .NET application that makes the actual calls to the windows search API. After doing so, you need to establish a communication channel between this component and your Java code. This can be done in various ways, for example by messaging to a small DB that your application will periodically pull. Or registering this component on the machine IIS (if exists) and expose simple WS API to communicate with it.
I know that it may sound cumbersome but the clear advantages are: a) you communicate with windows search API using the language it understands (.NET or COM) , b) you control all the application paths.

Any reason why you couldn't just use Runtime.exec() to query via search-ms and read the BufferedReader with the result of the command? For example:
public class ExecTest {
public static void main(String[] args) throws IOException {
Process result = Runtime.getRuntime().exec("search-ms:query=microsoft&");
BufferedReader output = new BufferedReader(new InputStreamReader(result.getInputStream()));
StringBuffer outputSB = new StringBuffer(40000);
String s = null;
while ((s = output.readLine()) != null) {
outputSB.append(s + "\n");
System.out.println(s);
}
String result = output.toString();
}
}

There are several libraries out there for calling COM objects from java, some are opensource (but their learning curve is higher) some are closed source and have a quicker learning curve. A closed source example is EZCom. The commercial ones tend to focus on calling java from windows as well, something I've never seen in opensource.
In your case, what I would suggest you do is front the call in your own .NET class (I guess use C# as that is closest to Java without getting into the controversial J#), and focus on making the interoperability with the .NET dll. That way the windows programming gets easier, and the interface between Windows and java is simpler.
If you are looking for how to use a java com library, the MSDN is the wrong place. But the MSDN will help you write what you need from within .NET, and then look at the com library tutorials about invoking the one or two methods you need from your .NET objects.
EDIT:
Given the discussion in the answers about using a Web Service, you could (and probably will have better luck) build a small .NET app that calls an embedded java web server rather than try to make .NET have the embedded web service, and have java be the consumer of the call. For an embedded web server, my research showed Winstone to be good. Not the smallest, but is much more flexible.
The way to get that to work is to launch the .NET app from java, and have the .NET app call the web service on a timer or a loop to see if there is a request, and if there is, process it and send the response.

Related

How do I pass multiple types of data over a serial port connection using JSSC?

I am tasked with creating a program that can initiate a payment on an Ingenico card terminal. It is a smaller part to a larger project to create a custom P.O.S system for a business. But I am facing a variety of road blocks making this problem difficult.
I have no experience in programming with serial ports. The documentation I have found online only depicts writing strings or bytes. The examples are simple but do not tell me enough information.
There is no documentation for the device I am using. Ingenico does not provide this information. The only way I have been able to figure out what data the card reader is expecting to initiate a payment is via this already completed project on github. Here is the link
https://github.com/Ousret/pyTeliumManager
This implementation is in python, and is using a linux-based system. We would be using this but we need a more custom implementation, hence why I am doing this in java.
I have looked and looked in this project to find how the data is structured and then sent over the serial port connection, but at this point I am missing it out of ignorance. I'm not familiar with python at all and the only thing I know is that the data to initiate a payment is as follows...
a float for the transaction amount
three strings, one for the currency type (USD, EUR etc) payment method (card) and the checkout ID (which can be anything, this is for personal book keeping)
and three booleans, one if you would like to wait for the transaction to be approved, one if you would like bank verification and one if you would like the payees payment information saved. (Ive set all these to false as they're not necessary at this moment. I am just trying to write something that works as a proof of concept before building an interface)
Now, here's some of my test code, and most of this is similar to what I've found on the internet through my research.
public static void main(String[] args) {
SerialPort cerealPort = new SerialPort("COM9");
try {
System.out.println("Port opened: " + cerealPort.openPort());
System.out.println("reading bytes " + cerealPort.readBytes());
System.out.println("name " + cerealPort.getPortName());
cerealPort.writeString("bing bong");
//cerealPort.setEventsMask(256);
System.out.println("Params setted: " + cerealPort.setParams(9600, 8, 1, 0));
System.out.println("\"Hello World!!!\" successfully writen to port: " + cerealPort.writeBytes("Hello World!!!".getBytes()));
System.out.println("Port closed: " + cerealPort.closePort());
}
catch (SerialPortException ex){
System.out.println(ex);
}
}
}
This code really doesn't really do anything and was just to test that communication with the device is working properly. Note that nothing happens on the terminal when this code is run.
Now I have this class called TelliumData, which has the data members I described.
public class TelliumData {
float amount;
String paymentMode = "debit";
String currency = "USD";
String checkoutId = "1";
boolean transactionWait = false; // waits for trans to end. Set to True if you need valid transaction status otherwise, set it to False.
boolean collectPaymentInfo = false;
boolean forceBankVerify = false;
}
I have 0 idea how to pass this information to the terminal using the functions in JSSC
My question is, at its core, how to I send this data over a serial port?
I have tried using .writebytes and .writeint to send over all data one by one but this doesn't do anything and doesn't trigger a payment initialization on the card reader.
I don't understand how the python implementation has done it either. It would be great if someone could explain how that data is packaged and sent.

Create a database / execute a bunch of mysql statements from Java

I have a library that needs to create a schema in MySQL from Java. Currently, I have a dump of the schema that I just pipe into the mysql command. This works okay, but it is not ideal because:
It's brittle: the mysql command needs to be on the path: usually doesn't work on OSX or Windows without additional configuration.
Also brittle because the schema is stored as statements, not descriptively
Java already can access the mysql database, so it seems silly to depend on an external program to do this.
Does anyone know of a better way to do this? Perhaps...
I can read the statements in from the file and execute them directly from Java? Is there a way to do this that doesn't involve parsing semicolons and dividing up the statements manually?
I can store the schema in some other way - either as a config file or directly in Java, not as statements (in the style of rails' db:schema or database.yml) and there is a library that will create the schema from this description?
Here is a snippet of the existing code, which works (when mysql is on the command line):
if( db == null ) throw new Exception ("Need database name!");
String userStr = user == null ? "" : String.format("-u %s ", user);
String hostStr = host == null ? "" : String.format("-h %s ", host);
String pwStr = pw == null ? "" : String.format("-p%s ", pw);
String cmd = String.format("mysql %s %s %s %s", hostStr, userStr, pwStr, db);
System.out.println(cmd + " < schema.sql");
final Process pr = Runtime.getRuntime().exec(cmd);
new Thread() {
public void run() {
try (OutputStream stdin = pr.getOutputStream()) {
Files.copy(f, stdin);
}
catch (IOException e) { e.printStackTrace(); }
}
}.start();
new Thread() {
public void run() {
try (InputStream stdout = pr.getInputStream() ) {
ByteStreams.copy(stdout, System.out);
}
catch (IOException e) { e.printStackTrace(); }
}
}.start();
int exitVal = pr.waitFor();
if( exitVal == 0 )
System.out.println("Create db succeeded!");
else
System.out.println("Exited with error code " + exitVal);

The short answer (as far as i know) is no.
You will have to do some parsing of the file into separate statements.
I have faced the same situation and you can find many questions on this topic here on SO.
some like here will show a parser. others can direct to tools Like this post from apache that can convert the schema to an xml format and then can read it back.
My main intention when writing this answer is to tell that I chose to use the command line in the end.
extra configuration: maybe it is an additional work but you can do it by config or at runtime based on the system you are running inside. you do the effort one time and you are done
depending on external tool: it is not as bad as it seems. you have some benefits too.
1- you don't need to write extra code or introduce additional libraries just for parsing the schema commands.
2- the tool is provided by the vendor. it is probably more debugged and tested than any other code that will do the parsing.
3- it is safer on the long run. any additions or changes in the format of dump that "might" break the parser will most probably be supported with the tool that comes with the database release. you won't need to do any change in your code.
4- the nature of the action where you are going to use the tool (creating schema) does not suggest frequent usage, minimizing the risk of it becoming a performance bottle neck.
I hope you can find the best solution for your needs.

Check out Yank, and more specifically the code examples linked to on that page. It's a light-weight persistence layer build on top of DBUtils, and hides all the nitty-gritty details of handling connections and result sets. You can also easily load a config file like you mentioned. You can also store and load SQL statements from a properties file and/or hard code the SQL statements in your code.

Efficient way to GET multiple HTML pages simultaneously

So I'm working on web scraping for a certain website. The problem is:
Given a set of URLs (in the order of 100s to 1000s), I would like to retrieve the HTML of each URL in an efficient manner, specially time-wise. I need to be able to do 1000s of requests every 5 minutes.
This should usually imply using a pool of threads to do requests from a set of not yet requested urls. But before jumping into implementing this, I believe that it's worth asking here since I believe this is a fairly common problem when doing web scraping or web crawling.
Is there any library that has what I need?

So I'm working on web scraping for a certain website.
Are you scraping a single server or is the website scraping from multiple other hosts? If it is the former, then the server you are scraping may not like too many concurrent connections from a single i/p.
If it is the latter, this is really a general question on how many outbound connections you should open from a machine. There is physical limit, but it is pretty large. Practically, it would depend on where that client is getting deployed. The better the connectivity, the higher number of connections it can accommodate.
You might want to look at the source code of a good download manager to see if they have a limit on the number of outbound connections.
Definitely user asynchronous i/o, but you would still do well to limit the number.

Your bandwidth utilization will be the sum of all of the HTML documents that you retrieve (plus a little overhead) no matter how you slice it (though some web servers may support compressed HTTP streams, so certainly use a client capable of accepting them).
The optimal number of concurrent threads depends a great deal on your network connectivity to the sites in question. Only experimentation can find an optimal number. You can certainly use one set of threads for retrieving HTML documents and a separate set of threads to process them to make it easier to find the right balance.
I'm a big fan of HTML Agility Pack for web scraping in the .NET world but cannot make a specific recommendation for Java. The following question may be of use in finding a good, Java based scraping platform
Web scraping with Java

I would start by researching asynchronous communication. Then take a look at Netty.
Keep in mind there is always a limit to how fast one can load a web page. For an average home connection, it will be around a second. Take this into consideration when programming your application.

http://wwww.Jsoup.org just for scrapping part! The thread pooling i think you should implement urself.
Update
if this approach is fitting your need, you can download the complete class files here:
http://codetoearn.blogspot.com/2013/01/concurrent-web-requests-with-thread.html
AsyncWebReader webReader = new AsyncWebReader(5/*number of threads*/, new String[]{
"http://www.google.com",
"http://www.yahoo.com",
"http://www.live.com",
"http://www.wikipedia.com",
"http://www.facebook.com",
"http://www.khorasannews.com",
"http://www.fcbarcelona.com",
"http://www.khorasannews.com",
});
webReader.addObserver(new Observer() {
#Override
public void update(Observable o, Object arg) {
if (arg instanceof Exception) {
Exception ex = (Exception) arg;
System.out.println(ex.getMessage());
} /*else if (arg instanceof List) {
List vals = (List) arg;
System.out.println(vals.get(0) + ": " + vals.get(1));
} */else if (arg instanceof Object[]) {
Object[] objects = (Object[]) arg;
HashMap result = (HashMap) objects[0];
String[] success = (String[]) objects[1];
String[] fail = (String[]) objects[2];
System.out.println("Failds");
for (int i = 0; i < fail.length; i++) {
String string = fail[i];
System.out.println(string);
}
System.out.println("-----------");
System.out.println("success");
for (int i = 0; i < success.length; i++) {
String string = success[i];
System.out.println(string);
}
System.out.println("\n\nresult of Google: ");
System.out.println(result.remove("http://www.google.com"));
}
}
});
Thread t = new Thread(webReader);
t.start();
t.join();

PHP code to access API of a Java library

I need to use the Java-based OpenNLP library in my PHP code. For example, I need to use its Sentence Detector component (en-sent.bin) for analysing text variables in my PHP code.
In its documentation, that API can be accessed from a Java code as follows:
InputStream modelIn = new FileInputStream("en-sent.bin");
try {
SentenceModel model = new SentenceModel(modelIn);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
How can do the same thing in PHP?
In other words, what is the PHP-equivalent to the above Java code?

The solution I'm about to implement is writing a java based backend (because just the OpenNLP tool's output just won't cut it) and using PHP's execute function to run it. If you want it to be even faster, I suggest making it a daemon, making PHP connect to it and send it data to return (it'd probably be a unix socket). Although it's kinda complex, it seems less error prone than a PHP/Java Bridge.

There would have to be a PHP API for accessing OpenNLP. A quick search doesn't show anything. The only other thing I can think of is using a PHP/Java Bridge of some sort, but that's more involved. See http://php-java-bridge.sourceforge.net/pjb/, for example.

How Do I Eject a Volume in Java?

How can I "eject" a volume with Java, cross platform?
I have a program that does some operations on a removable drive (USB memory card reader), and once it's done, I want the program to eject/unmount/remove (depending which os lingo we're talking in) the memory card.
Is there a reliable cross-platform method of doing this?

Probably isn't the answer you're looking for, but...
No.
To my knowledge, there isn't an established single-platform way of doing this. For that matter, I've never come across a Java way of doing this. A rather scary C# CodeProject does allow ejecting devices, but only on Windows.
The various, depressingly poor, Java USB libraries don't even hint at ejecting devices. They don't work across all platforms, so even if they did it wouldn't help you.
My suggestion: gin up some scripts or executables for each platform, and then just spin up a Process as needed.

A litle late response but i thought it was worth sharing...
Since the default Java API does not come with this feature on it, you could use external libraries as mentioned above, however i personally found it much more convenient (for windows) to have a third party exe file in the jar's classpath, extract it in the temp folder, execute it when needed and then remove it once the aplication is done with it.
As a third party program i used this which is a CLI only program that can do a few tricks with connected devices, and then used this code:
FileUtils.copyInputStreamToFile(MyClass.class.getClassLoader().getResourceAsStream(program),TEMP_EJECT_PROGRAM);
to export it to the temp file location (Using ApacheIO, you can definately do without it), and this code:
private void safelyRemoveDrive(final String driveLetter) {
new Thread(new Runnable() {
public void run() {
if (TEMP_EJECT_PROGRAM.exists()) {
System.out.println("Removing " + driveLetter);
try {
Process p = Runtime.getRuntime()
.exec("\"" + TEMP_EJECT_PROGRAM.toString() + "\" " + driveLetter + " -L");
p.waitFor();
Scanner s = new Scanner(p.getInputStream());
while (s.hasNextLine())
System.out.println(s.nextLine());
s.close();
System.out.println("Removed " + driveLetter + ".");
} catch (IOException | InterruptedException e) {
e.printStackTrace();
}
}
}
}).start();
}
to remove the drive. The pieces of code above are definately not suited for all aplications and the second one in perticular is not the greatest, there are other much better ways to do it than spawning an anonymus thread... Still however you get the idea behind it :)
Lastly, I sugest you inform the user appropriately and ask for their concent before executing any third-party software in their machine...
I hope this was helpful :-)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.