Download ".java" files only from the given website/url - java

For some research purpose, I want to download 1000 java classes (".java") files from the given website. I don't want to do this manually.
For example, below has many Java Source files which I want to get using scripting/programming. I've worked with Linux shell scripts, PHP, and Java. So any solution using these is appreciated.
http://www.cs.uic.edu/~sloan/CLASSES/java/
Thanks!

Based on the question
wget -A java -r https://www.cs.uic.edu/~sloan/CLASSES/java/
will download all ".java" files in the same directory structure as on the server.
This will also download the robots.txt file.

For the particular example you gave,
curl -vs https://www.cs.uic.edu/~sloan/CLASSES/java/ 2>&1 | grep -oP '(?<=").*.java(?=")' | sed -e 's|^|https://www.cs.uic.edu/~sloan/CLASSES/java/|' | xargs wget
Explanations
1) Get the page and print to stdout. It will give you entire html.
curl -vs https://www.cs.uic.edu/~sloan/CLASSES/java/ 2>&1
2) Find the word with .java in quotes, but output without quotes "[ANYTHING].java". It will give you something like HelloWorld.java.
grep -oP '(?<=").*.java(?=")'
3) Add prefix to make it full url, so you can download them. It will give you something like https://www.cs.uic.edu/~sloan/CLASSES/java/HelloWorld.java
sed -e 's|^|https://www.cs.uic.edu/~sloan/CLASSES/java/|'
4) Download them to the current directory.
xargs wget

Thank you all !!
I've done using "wget -r -l1 -nd -nc -A.java http://www.cs.uic.edu/~sloan/CLASSES/java/"
This was however my required task. But am just thinking, may be we can improve same "wget" to go on internet and get me 1000 ".java" files. Perhaps, we can invoke google search (from script) for a keyword "java tutorials" and then from the returned URL, scan for ".java" files.
Thank again all
Viki.

Related

can't make a shell script to make a jar file

I try to make a simple shell script to make a jar file. The jar command combined with -C does not work with wildcards. Therefor I use a wildcard to find the files I want. Write them to a file, and loop over them.
It looks something like this:
the_classes=''
cd "$bin_folder"
tmp_dir=$(mktemp -d -t java_sucks)
find "imui/core/" -type f -name "IMUI_Widget_Agent*.class" >"$tmp_dir/classes.txt"
while IFS="" read -r p || [ -n "$p" ]
do
the_classes="${the_classes} -C '$bin_folder' '$p'"
done < "$tmp_dir/classes.txt"
Using the above I complete the command:
cmd='jar cfm build/IMUI_Widget_Agent.jar'
cmd="${cmd} \"$bin_folder/imui/core/IMUI_Widget_Agent_MANIFEST.MF\" $the_classes"
printf "\n\n\ncmd\n\n\n"
echo $cmd
Now if I copy and paste this command to execute it works!
But I want to avoid the manual labour of doing the copy and paste by hand every time.
Now I have:
eval "$("$cmd")"
But I get an error File name too long. No matter what I try, every fix I do creates a new problem. I have been working 6 hours now to make this script.
What would be a good step forward?
Since you cd "$bin_folder" you don't actually need -C "$bin_folder":
#!/bin/bash
shopt -s globstar
cd "$bin_folder"
jar cfm build/IMUI_Widget_Agent.jar \
imui/core/IMUI_Widget_Agent_MANIFEST.MF \
imui/core/**/IMUI_Widget_Agent*.class
However, if you still want to add them as part of a larger script, you can easily and robustly build your command in an array:
#!/bin/bash
shopt -s globstar
cmd=(jar cfm build/IMUI_Widget_Agent.jar imui/core/IMUI_Widget_Agent_MANIFEST.MF)
cd "$bin_folder"
for file in imui/core/**/IMUI_Widget_Agent*.class
do
cmd+=(-C "$bin_folder" "$file")
done
echo "About to execute: "
printf "%q " "${cmd[#]}"
echo
"${cmd[#]}"
Alternatively, you can simply do eval "$cmd" with your code, which is equivalent to echo and copy-pasting. However, be aware that this is fragile and error prone because it requires careful escaping of the filenames which you're not currently doing.

Sejda Merge PDFs w/ Filelist

I'm trying to merge pdfs with an external CSV file. The CSV file and all pdfs are in the same directory. I've got the following code, but it keeps spitting back errors.
Call sejda-console.bat merge -f C:\temp -l C:\temp\book3.csv -o C:\temp\temp2.pdf
The tutorial uses calls for "[--filesListConfig -l value]", I'm just not sure if I'm using it right.
Thanks in advance.
You should not provide -f and -l at the same time.
Here's an example that works (commands are Unix format):
./bin/sejda-console merge -l /tmp/files.csv -o /tmp/output.pdf
And files.csv contains:
> sejda-console-1.0.0.M9 $ cat /tmp/files.csv
/tmp/file1.pdf,/tmp/file2.pdf

Installing JDK on Linux

I’m trying to install JDK on Centos, but I’m getting an error:
[root#www opt]# rpm -ivh /opt/jdk-7u51-linux-x64.rpm
Preparing... ########################################### [100%]
1:jdk ########################################### [100%]
error: unpacking of archive failed on file /usr/java/jdk1.7.0_51/db/lib/derbyLocale_zh_CN.jar;533ab42a: cpio: read
Where can be problem?
Thanks
Can you please check the md5sum of the downloaded file against these values here: http://www.oracle.com/technetwork/java/javase/downloads/java-se-binaries-checksum-1956892.html ?
If the md5sum does not check, it must be a broken download.
A short answer: try downloading and installing again.
A longer answer:
Even though you are getting this, the jdk is getting installed. Sort of. This seems to be an error in the packaging but it is not fatal. Here is the output from my machine. In the example I cat'd the release file so you can see what CentOS I'm using and did an md5sum on the rpm I'm using. You'll notice that it doesn't match the md5sum from the Oracle site link (provided above by Kristof).
Next, I did an ls on the target directory for the jdk /usr/java/ so you can see that it is empty. Then ran the rpm install and did another ls to show that some files are there.
Of course, this defeats the purpose of using rpm, because the install information is not going to go into the rpm database. If I query the database it doesn't come up and it's missing the links that usually get installed pointing to 'latest' and 'default'
So I took a guess that maybe there is a mirror that might have a bad rpm and went to the Oracle site and downloaded the file again. I gave it a different name so that I can compare the two files. This time it was successful.
In the screen shot below you can see the two files. The newest download being the one I called jdk-7u51-linux-x86_64.rpm. You can see the md5sum shows the files are different, but on the new file, the md5 matches the Oracle site. I removed the directory that was created by the failed install and then ran the rpm -ivh command and it completed successfully. The ls on the /usr/java directory shows all the correct stuff and querying the rpm database shows it was installed. Hope this helps!
You can do it with simple script which you can find here
It's for Centos rpm, but you can edit script and change rpm to tar.gz for example
Also you can change version or env in variables bellow
BASE_URL="technetwork/java/javase/downloads"
BASE_URL_OUTPUT="$(curl -s -L0 http://www.oracle.com/${BASE_URL}/)"
JAVA_ENVIRONMENT="JDK"
JAVA_BASE_VERSION=8
DOWNLOAD_SITE="$(echo $BASE_URL_OUTPUT | grep -m 1 -io "${JAVA_ENVIRONMENT}${JAVA_BASE_VERSION}-downloads-[0-9]*.html" -- | tail -1)"
echo "DOWNLOAD_SITE="$DOWNLOAD_SITE
DOWNLOAD_LINK_OUTPUT="$(curl -s -L -j -H "Cookie: oraclelicense=accept-securebackup-cookie" http://www.oracle.com/${BASE_URL}/${DOWNLOAD_SITE} | grep -io "filepath.*${JAVA_ENVIRONMENT}-[${JAVA_BASE_VERSION}].*linux[-_]x64[._].*\(rpm\)" -- | cut -d '"' -f 3 | tail -1)"
echo "DOWNLOAD_LINK_OUTPUT="$DOWNLOAD_LINK_OUTPUT
curl -L -o java_rpm_packet.rpm -b "oraclelicense=a" $DOWNLOAD_LINK_OUTPUT
and install with
yum localinstall -y java_rpm_packet.rpm

How to find specific string text within direct .class file or .class file inside various jar file which are kept inside several sub-directories

My worries about System.out.println used in java code, all java files are not available for me. We have only .class in Production. There are thousand files having System.out.println entry.
In order to clean this string. How can I find all culprit class files which has this string.
I know javap, which disassemble the .class file. But don't know if javap can be used for my purpose.
Any Unix command or java programme or awk script or known UI tool will work for me.
I just want to get rid of System.out.println
I used below command in cygwin but no success !
find . -iname '*.class' -printf "%p | grep -q 'System.out.println' && echo %p\n" | sh
I am using Gnu grep version 2.10.
Then given a sample class file wich contatin System.out.println i get
bash$ grep 'System.out.println' sample.class
Binary file sample.class matches
so in order to extract the files with matches you could use something like:
#! /bin/bash
shopt -s globstar nocaseglob
files=(**/*.class)
for (( i=0; i<${#files[#]}; i++ )) ; do
file="${files[$i]}"
res=$(grep 'System.out.println' $file)
[[ $? == 0 ]] && awk '{print $3}' <<< "$res"
done

How to search for a string in JAR files

My application is built on Java EE.
I have approximately 50 jars in this application.
Is it possible to search for a particular keyword (actually I want to search for a keyword BEGIN REQUEST)?
You can use zipgrep on Linux or OSX:
zipgrep "BEGIN REQUEST" file.jar
If you wish to search a number of jars, do
find libdir -name "*.jar" -exec zipgrep "BEGIN REQUEST" '{}' \;
where libdir is a directory containing all jars. The command will recursively search subdirectories too.
For windows, you can download cygwin and install zipgrep under it: http://www.cygwin.com/
Edit 1
To view the name of the file that the expression was found you could do,
find libdir -name "*.jar" | xargs -I{} sh -c 'echo searching in "{}"; zipgrep "BEGIN REQUEST" {}'
Edit 2
Simpler version of Edit 1
find libdir -name "*.jar" -print -exec zipgrep "BEGIN REQUEST" '{}' \;
Caution: This is not an accurate answer, it's only a quick heuristic approach. If you need to find something like the name of a class (e.g., which jar has class Foo?) or maybe a method name, then this may work.
grep --text 'your string' your.jar
This will search the jar file as if it were text. This is quicker because it doesn't expand the archive, but that is also why it is less accurate. If you need to be exhaustive then this is not the approach you should use, but if you want to try something a little quicker before pulling out zipgrep this is a good approach.
From man grep,
-a, --text
Process a binary file as if it were text; this is equivalent
to the --binary-files=text option.
in android i had to search both jar and aar files for a certain string i was looking for here is my implementation on mac:
find . -name "*.jar" -o -name "*.aar" | xargs -I{} zipgrep "AssestManager" {}
essentially finds all jars and aar files in current direclty (and find command is recursive by default) pipes the results to zipgrep and applies each file name as a parameter via xargs. the brackets at the end tell xargs where to put the file name you got from the find command. if you want to search the entire home directory just change the find . to find ~
Searching inside a jar or finding the class name which contains a particular text is very easy with WinRar search. Its efficient and always worked for me atleast.
just open any jar in WinRar, click on ".." until you reach the top folder from where you want to start the search(including subfolders).
Make sure to check the below options:
1.) Provide '*' in fields 'file names to find', 'Archive types'
2.) select check boxes 'find in subfolders', 'find in files', 'find in archives'.
Found the script below on alvinalexander.com. It is simple but useful for searching through all jar files in the current directory
#!/bin/sh
LOOK_FOR="codehaus/xfire/spring"
for i in `find . -name "*jar"`
do
echo "Looking in $i ..."
jar tvf $i | grep $LOOK_FOR > /dev/null
if [ $? == 0 ]
then
echo "==> Found \"$LOOK_FOR\" in $i"
fi
done
Replace "codehaus..." with your query, i.e. a class name.
Sample output:
$ ./searchjars.sh
Looking in ./activation-1.1.jar ...
Looking in ./commons-beanutils-1.7.0.jar ...
Looking in ./commons-codec-1.3.jar ...
Looking in ./commons-pool.jar ...
Looking in ./jaxen-1.1-beta-9.jar ...
Looking in ./jdom-1.0.jar ...
Looking in ./mail-1.4.jar ...
Looking in ./xbean-2.2.0.jar ...
Looking in ./xbean-spring-2.8.jar ...
Looking in ./xfire-aegis-1.2.6.jar ...
Looking in ./xfire-annotations-1.2.6.jar ...
Looking in ./xfire-core-1.2.6.jar ...
Looking in ./xfire-java5-1.2.6.jar ...
Looking in ./xfire-jaxws-1.2.6.jar ...
Looking in ./xfire-jsr181-api-1.0-M1.jar ...
Looking in ./xfire-spring-1.2.6.jar ...
==> Found "codehaus/xfire/spring" in ./xfire-spring-1.2.6.jar
Looking in ./XmlSchema-1.1.jar ...
One-liner solution that prints file names for which the search string is found, it doesn't jam your console with unnecessary "searching in" logs::
find libdir -wholename "*.jar" | xargs --replace={} bash -c 'zipgrep "BEGIN REQUEST" {} &>/dev/null; [ $? -eq 0 ] && echo "{}";'
Edit:: Removing unnecessary if statement, and using -name instead of -wholename (actually, I used wholename, but it depends on your scenario and preferences)::
find libdir -name "*.jar" | xargs --replace={} bash -c 'zipgrep "BEGIN REQUEST" {} &>/dev/null && echo "{}";'
You can also use sh instead of bash.
One last thing, --replace={} is just equivalent to -I{} (I usually use long option formats, to avoid having to go into the manual again later).
Fastjar - very old, but fit your needs. Fastjar contains tool called jargrep (or grepjar). Used the same way as grep:
> locate .jar | grep hibernate | xargs grepjar -n 'objectToSQLString'
org/hibernate/type/EnumType.class:646:objectToSQLString
org/hibernate/sql/Update.class:576:objectToSQLString
org/hibernate/sql/Insert.class:410:objectToSQLString
org/hibernate/usertype/EnhancedUserType.class:22:objectToSQLString
org/hibernate/persister/entity/SingleTableEntityPersister.class:2713:objectToSQLString
org/hibernate/hql/classic/WhereParser.class:1910:objectToSQLString
org/hibernate/hql/ast/tree/JavaConstantNode.class:344:objectToSQLString
org/hibernate/hql/ast/tree/BooleanLiteralNode.class:240:objectToSQLString
org/hibernate/hql/ast/util/LiteralProcessor.class:1363:objectToSQLString
org/hibernate/type/BigIntegerType.class:114:objectToSQLString
org/hibernate/type/ShortType.class:189:objectToSQLString
org/hibernate/type/TimeType.class:307:objectToSQLString
org/hibernate/type/CharacterType.class:210:objectToSQLString
org/hibernate/type/BooleanType.class:180:objectToSQLString
org/hibernate/type/StringType.class:166:objectToSQLString
org/hibernate/type/NumericBooleanType.class:128:objectToSQLString
org/hibernate/type/CustomType.class:543:objectToSQLString
org/hibernate/type/TimeZoneType.class:204:objectToSQLString
org/hibernate/type/DateType.class:343:objectToSQLString
org/hibernate/type/LiteralType.class:18:objectToSQLString
org/hibernate/type/ByteType.class:189:objectToSQLString
org/hibernate/type/LocaleType.class:259:objectToSQLString
org/hibernate/type/CharBooleanType.class:171:objectToSQLString
org/hibernate/type/TimestampType.class:409:objectToSQLString
org/hibernate/type/CurrencyType.class:256:objectToSQLString
org/hibernate/type/AbstractCharArrayType.class:219:objectToSQLString
org/hibernate/type/FloatType.class:177:objectToSQLString
org/hibernate/type/DoubleType.class:173:objectToSQLString
org/hibernate/type/LongType.class:223:objectToSQLString
org/hibernate/type/IntegerType.class:188:objectToSQLString
The below command shows the results with the file name and jar file name.
To find the string in the list of jar file.
find <%PATH of the Folder where you need to search%> -name "*.jar" -print -exec zipgrep "jar$|<%STRING THAT YOU NEED TO FIND>" '{}' \;
To find the class name in the list of jar file.
find . -name "*.jar" -print -exec jar tvf {} \; |grep -E "jar$|<%CLASS NAME THAT YOU NEED TO FIND>\.class"
Using jfind jar
JFind can find a Java class file anywhere on the filesystem, even if
it is hidden many levels deep in a jar within an ear within a zip!
http://jfind.sourceforge.net/
Although there are ways of doing it using a decomplier or eclipse , but it gets tricky when those jars are not part of your project , or its particularly painful when using decompiler and you have 100s or 1000s of jars placed in several folders.
I found this CMD command useful , which helps in finding the class names in list of jars present in directory .
forfiles /S /M *.jar /C "cmd /c jar -tvf #file | findstr "classname" && echo #path
You can either navigate to your desired path , and open cmd from there and run this command OR give the path directly in command itself , like this
forfiles /S /M *.jar /C "cmd /c jar -tvf #file | findstr /C:"classname" && echo #path
My use case was to find a particular class in Glassfish , so command will look something like this :

Categories

Resources