remove shell controll and non-printable characters from String (linux output)

remove shell controll and non-printable characters from String (linux output) - java

In a web scanner application, i need to parse some script's output to get some informations, but the problem is that i don't get the same output in linux shell and in java output, let me describe it (this example is done with whatweb on one of the websites i need to scan at work, but i also have this problem whenever i have a colored output in shell):
Here is what i get from linux's output (with some colors):
http://www.ceris-ingenierie.com [200] Apache[2.2.9], Cookies[ca67a6ac78ebedd257fb0b4d64ce9388,jfcookie,jfcookie%5Blang%5D,lang], Country[EUROPEAN UNION][EU], HTTPServer[Fedora Linux][Apache/2.2.9 (Fedora)], IP[185.13.64.116], Joomla[1.5], Meta-Author[Administrator], MetaGenerator[Joomla! 1.5 - Open Source Content Management], PHP[5.2.6,], Plesk[Lin], Script[text/javascript], Title[Accueil ], X-Powered-By[PHP/5.2.6, PleskLin]
And here is what i get from Java:
[1m[34mhttp://www.ceris-ingenierie.com[0m [200] [1m[37mApache[0m[[1m[32m2.2.9[0m], [1m[37mCookies[0m[[1m[33mca67a6ac78ebedd257fb0b4d64ce9388,jfcookie,jfcookie%5Blang%5D,lang[0m], [1m[37mCountry[0m[[1m[33mEUROPEAN UNION[0m][[1m[35mEU[0m], [1m[37mHTTPServer[0m[[1m[31mFedora Linux[0m][[1m[36mApache/2.2.9 (Fedora)[0m], [1m[37mIP[0m[[1m[33m185.13.64.116[0m], [1m[37mJoomla[0m[[1m[32m1.5[0m], [1m[37mMeta-Author[0m[[1m[33mAdministrator[0m], [1m[37mMetaGenerator[0m[[1m[33mJoomla! 1.5 - Open Source Content Management[0m], [1m[37mPHP[0m[[1m[32m5.2.6,[0m], [1m[37mPlesk[0m[[1m[33mLin[0m], [1m[37mScript[0m[[1m[33mtext/javascript[0m], [1m[37mTitle[0m[[32mAccueil [0m], [1m[37mX-Powered-By[0m[[1m[33mPHP/5.2.6, PleskLin[0m]
My guess is that colors in linux's shell are generated by those unknown characters, but they are really a pain for parsing in java.
I get this output by running the script in a new thread, and doing raw_data+=data;(where raw_data is a String) whenever i have a new line in my output, to finally send raw_data to my parser.
How can i do to avoid getting those annoying chars and so, to get a more friendly output like i get in linux's shell?

In your Java code, where you are executing the shell script, you can add an extra sed filter to filter out the shell-control characters.
# filter out shell control characters
./my_script | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g"
Use tr -dc '[[:print:]]' to remove non-printable characters, like this:
# filter out shell control characters
./my_script | \
sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" | \
tr -dc '[[:print:]]'
You could even add a wrapper script around the original script to do this. And call the wrapper script. This allows you to do any other pre-processing, before feeding it into the Java program and keeps it clean of all unnecessary code and you can focus on the core logic of the application.
If you can't add a wrapper script for any reason and would like to add the filter in Java, Java doesn't support pipes in the command, directly. You'll have to call your command as an argument to bash it like this:
String[] cmd = {
"/bin/sh",
"-c",
"./my_script | sed -r 's/\\x1B\\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g'"
};
Process p = Runtime.getRuntime().exec(cmd);
Don't forget to escape all the '\' when you use the regex in Java.
Source and description for the sed filter: http://www.commandlinefu.com/commands/view/3584/remove-color-codes-special-characters-with-sed

You can use a regex here:
String raw_data= ...;
String cleaned_raw_data = raw_data.replaceAll("\\[\\d+m", "");
This will remove any sequence of characters starting with a \\[, ending with a m and having between them one or more digit (\\d+).
Note that [ is preceded by a \\ because [ has a special meaning for regular expressions (it's a meta-character).
Description

Related

Get specific java version with powershell

I have some issues with getting the java version out as a string.
In a batch script I have done it like this:
for /f tokens^=2-5^ delims^=.-_^" %%j in ('%EXTRACTPATH%\Java\jdk_extract\bin\java -fullversion 2^>^&1') do set "JAVAVER=%%j.%%k.%%l_%%m"
The output is: 1.8.0_121
Now I want to do this for PowerShell, but my output is: 1.8.0_12, I miss one "1" in the end Now I have tried it with trim and split but nothing gives me the right output can someone help me out?
This is what I've got so var with PowerShell
$javaVersion = (& $extractPath\Java\jdk_extract\bin\java.exe -fullversion 2>&1)
$javaVersion = "$javaVersion".Trim("java full version """).TrimEnd("-b13")
The full output is: java full version "1.8.0_121-b13"

TrimEnd() works a little different, than you might expect:
'1.8.0_191-b12'.TrimEnd('-b12')
results in: 1.8.0_19 and so does:
'1.8.0_191-b12'.TrimEnd('1-b2')
The reason is, that TrimEnd() removes a trailing set of characters, not a substring. So .TrimEnd('-b12') means: remove all occurrences of any character of the set '-b12' from the end of the string. And that includes the last '1' before the '-'.
A better solution in your case would be -replace:
'java full version "1.8.0_191-b12"' -replace 'java full version "(.+)-b\d+"','$1'

Use a regular expression for matching and extracting the version number:
$javaVersion = if (& java -fullversion 2>&1) -match '\d+\.\d+\.\d+_\d+') {
$matches[0]
}
or
$javaVersion = (& java -fullversion 2>&1 | Select-String '\d+\.\d+\.\d+_\d+').Matches[0].Groups[0].Value

Using map functionality in shell script

I have below shellscript
MYMAP12=$(java -jar hello-0.0.1-SNAPSHOT.jar)
echo "==="
echo ${MYMAP12}
The output of java -jar hello-0.0.1-SNAPSHOT.jar will be map {one=one, two=two, three=three}
how to get each element from the key in shell script
I tried echo ${MYMAP12{one}} but it gave me an error

As #chepner implied, the Java code is just outputting a text string which has to be parsed and manipulated in bash to make it useful. There are no doubt several ways to do this, here is one which uses pure bash (i.e. no external programs):
# This is the text string supplied by Java
MYMAP12='{one=one, two=two, three=three}'
# Create an associative array called 'map'
declare -A map
# Remove first and last characters ( { and } )
MYMAP12=${MYMAP12#?}
MYMAP12=${MYMAP12%?}
# Remove ,
MYMAP12=${MYMAP12//,/}
# The list is now delimited by spaces, the default in a shell
for item in $MYMAP12
do
# This splits around '='
IFS='=' read key val <<< $item
map[$key]=$val
done
echo "keys: ${!map[#]}"
echo "values: ${map[#]}"
Gives:
keys: two three one
values: two three one
EDIT:
You should to use the correct tool for the job, if you need an associative array (map, hash table, dictionary) then you need a language with that feature. These include bash, ksh, awk, perl, ruby, python and C++.
You can extract the keys and values using a POSIX shell (sh) but you cannot store them in an associative array since sh does not have that feature. The best you can do is a generic list, which is just a text string of whitespace separated values. What you can do is to write a lookup function which emulates it:
get_value() {
map="$1"
key="$2"
for pair in $MYMAP12
do
if [ "$key" = "${pair%=*}" ]
then
value="${pair#*=}"
# Remove last character ( , or } )
value=${value%?}
echo "$value"
return 0
fi
done
return 1
}
MYMAP12='{kone=one, ktwo=two, kthree=three}'
# Remove first character ( { )
MYMAP12=${MYMAP12#?}
val=$(get_value "$MYMAP12" "ktwo")
echo "value for 'ktwo' is $val"
Gives:
value for 'ktwo' is two
Using this function you can also test for the presence of a key, for example:
if get_value "$MYMAP12" "kfour"
then
echo "key kfour exists"
else
echo "key kfour does not exist"
fi
Gives:
key kfour does not exist
Note that this is inefficient compared to an associative array since we are sequentially searching a list, although with a short list of only three keys you won't see any difference.

if you change your output format to the right hand side
$ x="( [one]=foo [two]=bar [three]=baz )"
then, you can use bash associative arrays
$ declare -A map="$x"
$ echo "${map[one]}"
foo

Getting no such file error when trying to run Maven wrapper? [duplicate]

I am trying to format a variable in linux
str="Initial Value = 168"
echo "New Value=$(echo $str| cut -d '=' -f2);">>test.txt
I am expecting the following output
Value = 168;
But instead get
Value = 168 ^M;

Don't edit your bash script on DOS or Windows. You can run dos2unix on the bash script. The issue is that Windows uses "\r\n" as a line separator, Linux uses "\n". You can also manually remove the "\r" characters in an editor on Linux.

str="Initial Value = 168"
newstr="${str##* }"
echo "$newstr" # 168
pattern matching is the way to go.

Try this:
#! /bin/bash
str="Initial Value = 168"
awk '{print $2"="$4}' <<< $str > test.txt
Output:
cat test.txt
Value=168
I've got comment saying that it doesn't address ^M, I actually does:
echo -e 'Initial Value = 168 \r' | cat -A
Initial Value = 168 ^M$
After awk:
echo -e 'Initial Value = 168 \r' | awk '{print $2"="$4}' | cat -A
Value=168$

First off, always quote your variables.
#!/bin/bash
str="Initial Value = 168"
echo "New Value=$(echo "$str" | cut -d '=' -f2);"
For me, this results in the output:
New Value= 168;
If you're getting a carriage return between the digits and the semicolon, then something may be wrong with your echo, or perhaps your input data is not what you think it is. Perhaps you're editing your script on a Windows machine and copying it back, and your variable assignment is getting DOS-style newlines. From the information you've provided in your question, I can't tell.
At any rate I wouldn't do things this way. I'd use printf.
#!/bin/bash
str="Initial Value = 168"
value=${str##*=}
printf "New Value=%d;\n" "$value"
The output of printf is predictable, and it handily strips off gunk like whitespace when you don't want it.
Note the replacement of your cut. The functionality of bash built-ins is documented in the Bash man page under "Parameter Expansion", if you want to look it up. The replacement I've included here is not precisely the same functionality as what you've got in your question, but is functionally equivalent for the sample data you've provided.

NASHORN $EXEC issue with sed

I am trying to execute below command with NASHORN, to pull out a section of log -
$EXEC("sed '1,/Token to find:/d;/Another token to find:/,$d' /path/to/log/file.log")
But it ends with -
Exit Code:1, Error Msg::sed: -e expression #1, char 1: unknown
command: `''
Trying the same on Linux command prompt,
below (with single quote ') it is able to pull out the log section -
sed '1,/Token to find:/d;/Another token to find:/,$d' /path/to/log/file.log
On the other hand changing the quotes (""), I get the same error -
sed "1,/Token to find:/d;/Another token to find:/,$d" /path/to/log/file.log
sed: -e expression #1, char 1: unknown command: `
Any idea what is the right way?

After trying different combinations with sed " / ' etc. - it looks like due to multiple scripting expressions (JavaScript/Linux Shell Script & sed command itself!) i am getting into this trouble.
As a workaround i have moved expression into a text file and provided
sed its location -
sed --file=/sed/expression/file /path/to/log/file.log
or
$EXEC("sed --file=/sed/expression/file /path/to/log/file.log");
var output=$OUT
var exitErrorMsg="Exit Code:" + $EXIT + ", Error Msg::" + $ERR
It now works like charm!

How to validate number.number.number;number this format in Reg ex

My string should contains in this format number.number.number;number ex:15.2.63;4
How to validate this format in Reg ex. I have done in normal way used contains, spilt etc. But lines of code increased. May I know how do it in reg ex?

You can go with this:
^\d+[.]\d+[.]\d+;\d+$
With a liveDemo

Many ways to do it, here using PCRE:
laptop:~$ echo "12.34.56;7" | perl -ne 'print $_ if (/^\d+\.\d+\.\d+;\d+$/);'
12.34.56;7
laptop:~$ echo "12a.34b.56c;7" | perl -ne 'print $_ if (/\d+\.\d+\.\d+;\d+/);'
laptop:~$ echo "12.34.56;7" | perl -ne 'print $_ if (/^(\d+\.){2}\d+;\d+$/);'
12.34.56;7
If you know the exact length of each part, you can also fix it.
For example \d{2}. will match 11. but won't match 123.
The above answer group dot into bracket ([.]) this is useless for a single character.
But if you delimiter may vary, you can use, for example [.;-] to allow . ; and - as a delimiter.

Try this..
^[1-9][\.\d]*[1-9][\.\d]*[1-9][\.\d]*[1-9][\;\d]?$
Hope this helps...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

remove shell controll and non-printable characters from String (linux output) - java

Related

Get specific java version with powershell

Using map functionality in shell script

Getting no such file error when trying to run Maven wrapper? [duplicate]

NASHORN $EXEC issue with sed

How to validate number.number.number;number this format in Reg ex

Categories

Resources