Vowpal Wabbit execute without writing to disk - java

I wrote a java code to execute Vowpal Wabbit in the following way:
System.out.println("Executing command " + command);
final Runtime r = Runtime.getRuntime();
final Process p = r.exec(command);
System.out.println("waiting for the process");
try (final BufferedReader b = new BufferedReader(new InputStreamReader(p.getInputStream()))) {
String line;
while ((line = b.readLine()) != null) {
final T lineResult = textParser.parseLine(line);
parserResultCombiner.addToCombiner(lineResult);
}
}
p.waitFor();
System.out.println("done");
}
where the command is
vw -d input.txt --loss_function=logistic -f model.vw
The disadvantage of this is that it requires writing to disk. After some searching, I learned that vowpal wabbit supports reading data from standard input
example in R
I could not find any example to accomplish this in Java 1.8. Could anyone share one with me?

You need to start vw in daemon mode. This starts a process that listens on the port specified.
$ vw -i model.vw -t --daemon --quiet --port 26542
Once the daemon has started, you can send samples to predict using socket calls
$ echo " abc-example| a b c" | netcat localhost 26542
0.000000 abc-example
$ echo " xyz-example| x y z" | netcat localhost 26542
1.000000 xyz-example
source:
https://github.com/JohnLangford/vowpal_wabbit/wiki/daemon-example
Recently they pushed a java version of the code that interacts with vw using jni
https://github.com/JohnLangford/vowpal_wabbit/tree/master/java

Related

Shell Scripts hangs when running through ProcessBuilder

I have a Java program where I am triggering a shell scripts.
Java Code Sample is :
ProcessBuilder pb = new ProcessBuilder(cmdList);
p = pb.start();
p.waitFor();
Where cmdList contains all required necessary input argument to execute the shell. This shell script is having a for loop inside and executing some DB scripts in that loop and printing result info & error logs in a file.
Below is sample shell script code :
#!/bin/bash
export PATH=/apps/PostgresPlus/as9.6/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
set -eE
#################################################### START
TIME_ELAPSED=""
TIME_ELAPSED_IN_HOURS=""
SCRIPT_START_TIME_FORMATTED=date '+%F %T'
SCRIPT_START_TIME_IN_SEC=date +%s
PROCESS_LOG_BASE_PATH="/data/logs/purge_log/"
PROCESS_LOG="$PROCESS_LOG_BASE_PATH/purge.log"
trap 'err=$?; logError 2>&1 "Error occurred during purging. Exiting with status $err at line $LINENO: ${BASH_COMMAND}. Please check logs for more info." >>$PROCESS_LOG' ERR
trap 'logError 2>&1 "Error occurred during purging. Exiting shell script execution as an external interrupt was received. Please check logs for more info." >>$PROCESS_LOG; trap ERR' INT
banner()
{
echo "+------------------------------------------------------------------------------------------------+"
printf "|tput bold[ %-40s tput sgr0|\n" "$1 ] tput setaf 2 $2"
echo "+------------------------------------------------------------------------------------------------+"
}
logError()
{
printf "[ProcessId- $$] [date "+%Y-%m-%d %H:%M:%S"] tput setaf 1 tput bold [ERROR] tput setaf 1 %-40s tput sgr0\n" "$#"
}
logInfo(){
printf "[ProcessId- $$] [date "+%Y-%m-%d %H:%M:%S"] tput setaf 6 bold [INFO] %-40s tput sgr0\n" "$#"
}
logWarn(){
printf "[ProcessId- $$] [date "+%Y-%m-%d %H:%M:%S"] tput setaf 3 tput bold [WARNING] %-40s tput sgr0\n" "$#"
}
logHint(){
printf "[ProcessId- $$] [date "+%Y-%m-%d %H:%M:%S"] tput setaf 5 tput sitm %-40s tput sgr0\n" "$#"
}
main()
{
banner "$SCRIPT_START_TIME_FORMATTED" "Started processing" | tee -a $PROCESS_LOG
logInfo "Started execution at $SCRIPT_START_TIME_FORMATTED" | tee -a $PROCESS_LOG
set PGPASSWORD=$DB_PASSWORD
export PGPASSWORD=$DB_PASSWORD
# Call DB function for audit and category wise data purging, population of schema names
SCHEMA_NAMES_RESULT=$(psql -h $HOST_NAME -d $DB_NAME -U $DB_USER -p $DB_PORT -At -c "SELECT $COMMON_SCHEMA_NAME.purge_audit_and_populate_schema_names('$COMMON_SCHEMA_NAME', $PURGE_DATA_INTERVAL_IN_DAYS,'$SCHEMA_NAMES',$NUM_TOP_CONTRIBUTING_TENANTS)")
SCHEMA_NAMES_RESULT=$(echo "$SCHEMA_NAMES_RESULT" | sed 's/{//g; s/}//g; s/"//g' )
SCHEMA_NAMES=$(echo $SCHEMA_NAMES_RESULT | rev | cut -d"," -f2- | rev)
#Convert comma separated string of tenants to array
SCHEMA_NAMES=($(echo "$SCHEMA_NAMES" | tr ',' '\n'))
# loop for multi schema
for element in "${SCHEMA_NAMES[#]}"
do
logInfo "Effective tenant - $element, Script start time - $SCRIPT_START_TIME_FORMATTED" | tee -a $PROCESS_LOG
# PGSQL call to DB function to execute purging
logInfo "Time elapsed since script execution started - $TIME_ELAPSED" | tee -a $PROCESS_LOG
done
#logInfo "Purge completed!" | tee -a $PROCESS_LOG
logInfo "Purge execution completed successfully at `date '+%F %T'`" | tee -a $PROCESS_LOG
exit 0
}
mkdir -p $PROCESS_LOG_BASE_PATH
main "$#"
#################################################### END
Following is my observation with this program.
When running shell script directly on putty it executes properly without any error.
When triggering shell script through above java program the following behavior I observed.
a. It hangs after a certain iteration in for loop.
b. As I reduce the number of logs entries from shell scripts, iteration (for loop) numbers keeps on increasing.
c. When I removed all info logs and keeps on printing only error log then it completed successfully.
Can someone please help in identifying the reason behind this behavior.
For now, I put check on the number of iteration in for loop but that problem can occur any time when I will start receiving multiple error log.
Regards
Kushagra
You have to consume the process streams or map err and out to file so the native buffers don't fill up. It works better if you create threads to consume each stream. The hacky single thread version is something like this:
ProcessBuilder pb = new ProcessBuilder(cmdList);
p = pb.start();
try (InputStream in = p.getInputStream();
InputStream err = p.getErrorStream();
OutputStream closeOnly = p.getOutputStream()) {
while (p.isAlive()) {
long skipped = 0L;
try {
skipped = in.skip(in.available())
+ err.skip(err.available());
} catch (IOException jdk8155808) {
byte[] b = new byte[2048];
int read = in.read(b, 0, Math.min(b.length, in.available());
if (read > 0) {
skipped += read;
}
read = err.read(b, 0, Math.min(b.length, err.available());
if (read > 0) {
skipped += read;
}
}
if(skipped == 0L) {
p.waitFor(5L, TimeUnit.MILLISECONDS);
}
}
} finally {
p.destroy();
}
The thread way works like this:
public void foo() {
class DevNull implements Runnable {
private final InputStream is;
DevNull(final InputStream is) {
is = Objects.requireNonNull(is);
}
public void run() {
byte[] b = new byte[64];
try {
while (is.read(b) >= 0);
} catch(IOException ignore) {
}
}
}
ExecutorService e = Executors.newCachedThreadPool();
ProcessBuilder pb = new ProcessBuilder(cmdList);
Process p = pb.start();
try (InputStream in = p.getInputStream();
InputStream err = p.getErrorStream();
OutputStream closeOnly = p.getOutputStream()) {
e.execute(new DevNull(in));
e.execute(new DevNull(err));
p.waitFor();
} finally {
p.destroy();
e.shutdown();
}
}
Thanks multi threading one worked for me.
For single thread option it got failed on skip().
Thanks again for helping in resolving the issue.

Read output git-bash with ProcessBuilder in Java

I can't read the output of my git command that I run in git bash in my Java application with ProcessBuilder.
OS : Windows 8.1 --- IDE : IntelliJ
My code tries to list all the files in a github repository and count the number of java file types.
Complete git command (pipe type):
cd C:/Users/Utente/Documents/Repository-SVN-Git/Bookkeeper && git ls-files | grep .java | wc -l
The result is present in my git bash but it is not shown in my Java application and I cannot understand why this is so.
Result in git-bash :
2140
Result in IntelliJ :
--- Command run successfully
--- Output=
This is my class java:
public class SelectMetrics {
public static final String path_bash = "C:/Program Files/Git/git-bash.exe";
public static final String path_repository = "cd C:/Users/Utente/Documents/Bookkeeper";
public static final String git_command = "git ls-files | grep .java | wc -l";
public static final String command_pipe = path_repository + " && " + git_command;
public static void main(String[] args) {
runCommandPIPE(command_pipe);
}
public static void runCommandPIPE(String command) {
try {
ProcessBuilder processBuilder = new ProcessBuilder();
processBuilder.command(path_bash, "-c", command);
Process process = processBuilder.start();
StringBuilder output = new StringBuilder();
BufferedReader reader = new BufferedReader(
new InputStreamReader(process.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
output.append(line + "\n");
}
int exitVal = process.waitFor();
if (exitVal == 0) {
System.out.println(" --- Command run successfully");
System.out.println(" --- Output=" + output);
} else {
System.out.println(" --- Command run unsuccessfully");
}
} catch (IOException | InterruptedException e) {
System.out.println(" --- Interruption in RunCommand: " + e);
// Restore interrupted state
Thread.currentThread().interrupt();
}
}
}
---- EDIT ----
I have found a way to take the git-bash output by printing it in a txt file and then reading it from my java application. Here you can find the code:
Open git bash using processBuilder and execute command in it
However I still don't understand why I can't read the output with ProcessBuilder
The problem should be in the use of
C:/Program Files/Git/git-bash.exe
because it opens the window that a user uses for working, but at runtime in a java application you should use
C:/Program Files/Git/bin/bash.exe
In this way ProcessBuilder can read the result of the git operations.
ProcessBuilder cannot read from the window git-bash.exe and it is correct that the result from the reading is null. If you run commands in git-bash.exe at runtime the result will be only shown in the window git-bash.exe and the java application cannot read it.
--- EDIT 2021/03/26 ---
In conclusion, for run a command with git-bash and read from it the output at runtime in your java application, you have to change my question code with:
public static final String path_bash = "C:/Program Files/Git/bin/bash.exe";
then
Result in git-bash :
2183
Result in IntelliJ :
--- Command run successfully
--- Output=2183

Facing some problems while running the Java Program through Shell Script

I have written a shell script for automatic
1) start of hadoop services (namenode,datanode,jobtracker,tasktracker,secondary namenode),
2) dropping all tables from hive
3) again importing all tables in hive from SQL SERVER
And I am calling this shel script from java. Below is the code of Shell Script and Java Code
Shell Script:
export HADOOP_HOME=/home/hadoop/hadoop-0.20.2-cdh3u2/
export HIVE_HOME=/home/hadoop/hive-0.7.1/
export SQOOP_HOME=/home/hadoop/sqoop-1.3.0-cdh3u1/
export MSSQL_CONNECTOR_HOME=/home/hadoop/sqoop-sqlserver-1.0
export HBASE_HOME=/home/hadoop/hbase-0.90.1-cdh3u0
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.3.1+10
export SQOOP_CONF_DIR=/home/hadoop/sqoop-1.3.0-cdh3u1/conf/
/home/hadoop/hadoop-0.20.2-cdh3u2/bin/hadoop/start-all.sh
/home/hadoop/hadoop-0.20.2-cdh3u2/bin/hadoop -rmr /user/hadoop/*
/home/hadoop/hive-0.7.1/bin/hive -e 'show tables' > TablesToDelete.txt
while read line1
do
echo 'drop table '$line1
/home/hadoop/hive-0.7.1/bin/hive -e 'drop table '$line1
done < TablesToDelete.txt
while read line
do
echo $line" ------------------------------"
/home/hadoop/sqoop-1.3.0-cdh3u1/bin/sqoop-import --connect 'jdbc:sqlserver://192.168.1.1;username=abcd;password=12345;database=HadoopTest' --table line --hive-table $line --create-hive-table --hive-import -m 1 --hive-drop-import-delims --hive-home /home/hadoop/hive-0.7.1 --verbose
done < /home/hadoop/sqoop-1.3.0-cdh3u1/bin/tables.txt
Java Code:
public class ImportTables
{
public static void main(String arsg[])
{
PrintWriter pw=null;
try
{
Formatter formatter = new Formatter();
String LogFile = "Log-"+ formatter.format("%1$tm%1$td-%1$tH%1$tM%1$tS", new Date());
File f=new File("/home/hadoop/"+LogFile);
FileWriter fw1=null;
pw=new PrintWriter(f);
String cmd = "/home/hadoop/sqoop-1.3.0-cdh3u1/bin/TablesToImport.sh"; // this is the command to execute in the Unix shell
// create a process for the shell
ProcessBuilder pb = new ProcessBuilder("bash", "-c", cmd);
pb.redirectErrorStream(true); // use this to capture messages sent to stderr
Process shell = pb.start();
InputStream shellIn = shell.getInputStream(); // this captures the output from the command
int shellExitStatus = shell.waitFor();
// wait for the shell to finish and get the return code
// at this point you can process the output issued by the command
// for instance, this reads the output and writes it to System.out:
int c;
while ((c = shellIn.read()) != -1)
{
System.out.write(c);
}
// close the stream
shellIn.close();
}
catch(Exception e)
{
e.printStackTrace();
e.printStackTrace(pw);
pw.flush();
System.exit(1);
}
}
}
But as I run the program I see nothiing on the console, and program remains in running mode.
And If I put the following code ion shell script:
/home/hadoop/hive-0.7.1/bin/hive -e 'show tables' > TablesToDelete.txt
while read line1
do
echo 'drop table '$line1
/home/hadoop/hive-0.7.1/bin/hive -e 'drop table '$line1
done < TablesToDelete.txt
Then the output come as:
Cannot find hadoop installation: $HADOOP_HOME must be set or hadoop must be in the path
What is the problem in my program/script? Where and How to set HADOOP_HOME and all that path in my script?
The call to waitFor is a blocking call, just as the name implies. It halts further execution until the process is done. But since your code is also the sink for the process's stdout, the whole thing blocks. Just move the waitFor to after you've processed the script's output.

Difference - executing a unix command through java and through prompt i.e normal execution

Could you please help me to resolve this issue.
I have a Java code which runs the rsync command using Runtime object.
I am running the below code at source machine, If there is any rsync connectivity problem during sync at target machine, the code should receive exit value, but that is not happening now.
String rsyncCommand = "rsync –abv <source> <remoteAddr:dest>"
Runtime rt = Runtime.getRuntime ();
rt.exec(rsyncCommand);
To give you more details:
When I run the rsync command directly(not through java code) in source machine and if I kill the rsync process at target machine using kill -9 option during sync, the rsync process at source will exit with exit message.
But if I run the rsync through my java code and if I kill the process during the sync at target, it is not receiving any exit message. The java and rsync process are still in running mode. But not doing any tasks.
What is the difference in running the command through java and directly through command prompt?
Any one has similar kind of problem with rsync, do we have any other options to run the rsync through java, I tried with “ProcessBuilder” as well.
Please provide me some pointers to solve this issue.
Thanks for the response, i gave only sample code, below is the complete code which i am using in my java.
Runtime rt = Runtime.getRuntime();
Process proc = null;
try {
proc = rt.exec(rsyncCommand);
InputStream stderr = proc.getErrorStream();
InputStreamReader isrErr = new InputStreamReader(stderr);
BufferedReader brErr = new BufferedReader(isrErr);
InputStream stdout = proc.getInputStream();
InputStreamReader isrStd = new InputStreamReader(stdout);
BufferedReader brStd = new BufferedReader(isrStd);
String val = null;
while ((val = brStd.readLine()) != null) {
System.out.println(val);
}
while ((val = brErr.readLine()) != null) {
System.out.println(val);
}
int exitVal = proc.waitFor();
} catch (Exception e) {
e.printStackTrace();
}
if you do this and the process is not finished yet you will not receive exit value
Process process = rt.exec(rsyncCommand);
int exitValue = process.exitValue();
instead you should use
int exitValue = process.waitFor()
then the thread will wait until the process returns exit value
Your invocation of exec() is incorrect, it should specify the parameters directly, something like:
Runtime rt = Runtime.getRuntime ();
rt.exec(new String[]{"rsync", "-abv", "<source>", "<remoteAddr:dest>"});
exec doesn't do any parsing of the command line, so it's trying to exec a command called "rsync –abv " (as a single string)

Sleep OS X from Java

Really simple little function, but does anyone know how to sleep OS X from Java?
Cheers
System.exec("osascript -e 'tell application \"System Events\" to sleep'");
See: Making Mac OS X sleep from the command line
Create a script with the following:
#!/bin/bash
osascript << EOT
tell application "System Events"
sleep
end
EOT
And use system to exec it.
public void gotoSleep(){
try{
logger.finer("Zzz...");
if (preferences.getOS().equals("OSX") == true ){
Process p = Runtime.getRuntime().exec
("/bin/bash");
String command = "osascript -e 'tell application \"System Events\"' "
+ " -e \"sleep\" -e 'end tell'";
OutputStream stdin = p.getOutputStream();
stdin.write( command.getBytes() );
stdin.flush();
stdin.close();
}
}catch( Exception e ) {
logger.warning( e.toString() );
}
}
For Some reason while i was doing it it did not work without executing it through bash.

Categories

Resources