The image below shows the format of my settings file for a web bot I'm developing. If you look at line 31 in the image you will see it says chromeVersion. This is so the program knows which version of the chromedriver to use. If the user enters an invalid response or leaves the field blank the program will detect that and determine the version itself and save the version it determines to a string called chromeVersion. After this is done I want to replace line 31 of that file with
"(31) chromeVersion(76/77/78), if you don't know this field will be filled automatically upon the first run of the bot): " + chromeVersion
To be clear I do not want to rewrite the whole file I just want to either change the value assigned to chromeVersion in the text file or rewrite that line with the version included.
Any suggestions or ways to do this would be much appreciated.
image
You will need to rewrite the whole file, except the byte length of the file remains the same after your modification. Since this is not guaranteed to be the case or to find out is too cumbersome here is a simple procedure:
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
public class Lab1 {
public static void main(String[] args) {
String chromVersion = "myChromeVersion";
try {
Path path = Paths.get("C:\\whatever\\path\\toYourFile.txt");
List<String> lines = Files.readAllLines(path, StandardCharsets.UTF_8);
int lineToModify = 31;
lines.set(lineToModify, lines.get(lineToModify)+ chromVersion);
Files.write(path, lines, StandardCharsets.UTF_8);
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Note that this is not the best way to go for very large files. But for the small file you have it is not an issue.
Related
I have a Folder inbox, now every time a user chooses to send a mail, it creates a new file in inbox. Now my code, creates only one file for all mails. I thought of making a counter but that will just reset to zero every time it runs.
int mailCounter = 0;
String mailPath = ""+"Server\\"+rcptEmail+"\\Inbox\\Mail"+mailCounter;
File Mail = new File(mailPath);
mailCounter++;
try{
if(Mail.createNewFile()){
System.out.println("Mail created");
}
}catch(Exception error){}
Any ideas on how to make it such as
Inbox
Mail1
Mail2
Mail3
As suggested in the comment to your question, i.e. list the files in folder Inbox and get the largest number. Then add 1 (one) to it and create a new file whose name is Mail plus the number.
The below code simply shows how to get the largest number. I presume you can figure out the rest. Note that the code uses the stream API.
Explanations after the code.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Comparator;
import java.util.Optional;
public class ListFile {
public static void main(String[] args) {
String mailPath = ""+"Server\\"+rcptEmail+"\\Inbox";
Path dir = Paths.get(mailPath);
try {
Optional<Integer> maybeMax = Files.list(dir)
.map(path -> path.getFileName().toString())
.filter(name -> name.startsWith("Mail"))
.map(path -> Integer.valueOf(path.substring(4)))
.sorted()
.max(Comparator.comparingInt(Integer::intValue));
int max = maybeMax.orElse(Integer.valueOf(0));
}
catch (IOException xIo) {
xIo.printStackTrace();
}
}
}
Create a stream of all the files in folder Inbox.
Filter out all the files whose name does not start with Mail.
Get the number at the end of the file name.
Sort the numbers.
Get the largest number.
I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0
so what I want is some method that generates the correct Unicode not the just the official one
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
Notice:
The sample code in this answer might be outdated please refer to h q's answer for the working sample code
At First I will thank Tilman Hausherr and M.Prokhorov for showing me the library that made writing Arabic possible using PDFBox Apache.
This Answer will be divided into two Sections:
Downloading the library and installing it
How to use the library
Downloading the library and installing it
We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.
You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.
icu4j-docs.jar
icu4j-src.jar
icu4j.jar
The following explanation for creating and adding a library in Netbeans IDE
Navigate to the Toolbar and Click tools
Choose Libraries
At the bottom left you will find new Library button Create yours
Navigate to the library that you created in libraries list
Click it and add jar folders like that
Add icu4j.jar in class path
Add icu4j-src.jar in Sources
Add icu4j-docs.jar in Javadoc
View your opened projects from the very right
Expand the project that you want to use the library in
Right Click on the libraries folder and choose add library
Finally choose the library that you had just created.
Now you are ready to use the library just import what you want like that
import com.ibm.icu.What_You_Want_To_Import;
How to use the library
With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("Arabic Font File of format.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
//The Trick in the next Line of Code But Here is some few Notes first
//We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language
//The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
// So we have to write arabic string to pdf line by line..It will be like this
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
// Note the previous line of code throws ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
}
Here is the output
I hope that I had gone over everything.
Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help
public static boolean isInt(String Input)
{
try{Integer.parseInt(Input);return true;}
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char[] Separated = Input.toCharArray();int i = 0;
String Result = "",Hold = "";
for(;i<Separated.length;i++ )
{
if(isInt(Separated[i]+"") == true)
{
while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-'))
{
Hold += Separated[i];
i++;
}
Result+=reverse(Hold);
Hold="";
}
else{Result+=Separated[i];}
}
return Result;
}
Here is a code that works. Download a sample font, e.g. trado.ttf
Make sure the pdfbox-app and icu4j jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}
I'm trying to generate a PDF that contains Arabic text using PDFBox Apache but the text is generated as separated characters because Apache parses given Arabic string to a sequence of general 'official' Unicode characters that is equivalent to the isolated form of Arabic characters.
Here is an example:
Target text to Write in PDF "Should be expected output in PDF File" -> جملة بالعربي
What I get in PDF File ->
I tried some methods but it's no use here are some of them:
1. Converting String to Stream of bits and trying to extract right values
2. Treating String a sequence of bytes with UTF-8 && UTF-16 and extracting values from them
There is some approach seems very promising to get the value "Unicode" of each character But it generate general "official Unicode" Here is what I mean
System.out.println( Integer.toHexString( (int)(new String("كلمة").charAt(1))) );
output is 644 but fee0 was the expected output because this character is in middle from then I should get the middle Unicode fee0
so what I want is some method that generates the correct Unicode not the just the official one
The very Left column in the first table in the following link represents the general Unicode
Arabic Unicode Tables Wikipedia
Notice:
The sample code in this answer might be outdated please refer to h q's answer for the working sample code
At First I will thank Tilman Hausherr and M.Prokhorov for showing me the library that made writing Arabic possible using PDFBox Apache.
This Answer will be divided into two Sections:
Downloading the library and installing it
How to use the library
Downloading the library and installing it
We are going to use ICU Library.
ICU stands for International Components for Unicode and it is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.
To download the Library go to the downloads page from here.
Choose the latest version of ICU4J as shown in the following image.
You will be transferred to another page and you will find a box with direct links of the needed components .Go ahead and download three Files you will find the highlighted in next image.
icu4j-docs.jar
icu4j-src.jar
icu4j.jar
The following explanation for creating and adding a library in Netbeans IDE
Navigate to the Toolbar and Click tools
Choose Libraries
At the bottom left you will find new Library button Create yours
Navigate to the library that you created in libraries list
Click it and add jar folders like that
Add icu4j.jar in class path
Add icu4j-src.jar in Sources
Add icu4j-docs.jar in Javadoc
View your opened projects from the very right
Expand the project that you want to use the library in
Right Click on the libraries folder and choose add library
Finally choose the library that you had just created.
Now you are ready to use the library just import what you want like that
import com.ibm.icu.What_You_Want_To_Import;
How to use the library
With ArabicShaping Class and reversing the String we can write a correct attached Arabic LINE
Here is the Code Notice the comments in the following code
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import java.io.File;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("Arabic Font File of format.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
//The Trick in the next Line of Code But Here is some few Notes first
//We have to reverse the string because PDFBox is Writting from the left but Arabic is RTL Language
//The output will be perfect except every line will be justified to the left "It's not hard to resolve this"
// So we have to write arabic string to pdf line by line..It will be like this
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(new StringBuilder(new ArabicShaping(reverseNumbersInString(ArabicShaping.LETTERS_SHAPE).shape(s))).reverse().toString());
// Note the previous line of code throws ArabicShapingExcpetion
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
}
Here is the output
I hope that I had gone over everything.
Update : After reversing make sure to reverse the numbers again in order to get the same proper number
Here is a couple of functions that could help
public static boolean isInt(String Input)
{
try{Integer.parseInt(Input);return true;}
catch(NumberFormatException e){return false;}
}
public static String reverseNumbersInString(String Input)
{
char[] Separated = Input.toCharArray();int i = 0;
String Result = "",Hold = "";
for(;i<Separated.length;i++ )
{
if(isInt(Separated[i]+"") == true)
{
while(i < Separated.length && (isInt(Separated[i]+"") == true || Separated[i] == '.' || Separated[i] == '-'))
{
Hold += Separated[i];
i++;
}
Result+=reverse(Hold);
Hold="";
}
else{Result+=Separated[i];}
}
return Result;
}
Here is a code that works. Download a sample font, e.g. trado.ttf
Make sure the pdfbox-app and icu4j jar files are in your classpath.
import java.io.File;
import java.io.IOException;
import com.ibm.icu.text.ArabicShaping;
import com.ibm.icu.text.ArabicShapingException;
import com.ibm.icu.text.Bidi;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.*;
public class Main {
public static void main(String[] args) throws IOException , ArabicShapingException
{
File f = new File("trado.ttf");
PDDocument doc = new PDDocument();
PDPage Page = new PDPage();
doc.addPage(Page);
PDPageContentStream Writer = new PDPageContentStream(doc, Page);
Writer.beginText();
Writer.setFont(PDType0Font.load(doc, f), 20);
Writer.newLineAtOffset(0, 700);
String s ="جملة بالعربي لتجربة الكلاس اللذي يساعد علي وصل الحروف بشكل صحيح";
Writer.showText(bidiReorder(s));
Writer.endText();
Writer.close();
doc.save(new File("File_Test.pdf"));
doc.close();
}
private static String bidiReorder(String text)
{
try {
Bidi bidi = new Bidi((new ArabicShaping(ArabicShaping.LETTERS_SHAPE)).shape(text), 127);
bidi.setReorderingMode(0);
return bidi.writeReordered(2);
}
catch (ArabicShapingException ase3) {
return text;
}
}
}
I have a small java program that reads a given file with data and converts it to a csv file.
I've been trying to use the arrow symbols: ↑, ↓, → and ← (Alt+24 to 27) but unless the program is run from within Netbeans (Using F6), they will always come out as '?' in the resulting csv file.
I have tried using the unicodes, eg "\u2190" but it makes no difference.
Anyone know why this is happening?
As requested, here is a sample code that gives the same issue. This wont work when run using the .jar file, just creating a csv file containing '?', however running from within Netbeans works.
import java.io.FileNotFoundException;
import java.io.PrintWriter;
public class Sample {
String fileOutName = "testresult.csv";
/**
* #param args the command line arguments
*/
public static void main(String[] args) throws FileNotFoundException {
Sample test = new Sample();
test.saveTheArrow();
}
public void saveTheArrow() {
try (PrintWriter outputStream = new PrintWriter(fileOutName)) {
outputStream.print("←");
outputStream.close();
}
catch (FileNotFoundException e) {
// Do nothing
}
}
}
new PrintWriter(fileOutName) uses the default charset of the JVM - you may have different defaults in Netbeans and in the console.
Google Sheet uses UTF_8 according to this thread so it would make sense to save your file using that character set:
Files.write(Paths.get("testresult.csv"), "←".getBytes(UTF_8));
Using the "<-" character in your editor is for sure not the desired byte 0x27.
Use
outputStream.print( new String( new byte[] { 0x27}, StandardCharsets.US_ASCII);
Basically I intend to extract the entire category tree in Wikipedia under the root node "Economics" using Wikipedia API sandbox. I don't need the content of the articles, I just need few basic details like pageid, title, revision history (at some later stage of my work). As of now I can extract it level by level but what I want is a recursive/iterative function which does it.
Each category contains a categories and articles (like each root contains nodes and leaves).
I wrote one code to extract the first level into files. one file contains the articles, second folder contains the name of categories (daughters of the root which can be further sub-classified).
Then I went into level and extracted their categories and articles and sub-categories using similar code.
The code remains similar in each case but its the scalability. I need to reach the lowest leaves of all nodes. So i need a recursion which continuously checks till the end.
I labelled files which contains categories as 'c_', so I can provide the condition while extracting different levels.
Now for some reason it has entered into a deadlock and keeps adding same things again and again. I need a way out of the deadlock.
package wikiCrawl;
import java.awt.List;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;
import java.util.Scanner;
import org.apache.commons.io.FileUtils;
import org.json.CDL;
import org.json.JSONArray;
import org.json.JSONException;
import org.json.JSONObject;
public class SubCrawl
{
public static void main(String[] args) throws IOException, InterruptedException, JSONException
{ File file = new File("C:/Users/User/Desktop/Root/Economics_2.txt");
crawlfile(file);
}
public static void crawlfile(File food) throws JSONException, IOException ,InterruptedException
{
ArrayList<String> cat_list =new ArrayList <String>();
Scanner scanner_cat = new Scanner(food);
scanner_cat.useDelimiter("\n");
while (scanner_cat.hasNext())
{
String scan_n = scanner_cat.next();
if(scan_n.indexOf(":")>-1)
cat_list.add(scan_n.substring(scan_n.indexOf(":")+1));
}
System.out.println(cat_list);
//get the categories in different languages
URL category_json;
for (int i_cat=0; i_cat<cat_list.size();i_cat++)
{
category_json = new URL("https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtitle=Category%3A"+cat_list.get(i_cat).replaceAll(" ", "%20").trim()+"&cmlimit=500"); //.trim() removes trailing and following whitespaces
System.out.println(category_json);
HttpURLConnection urlConnection = (HttpURLConnection) category_json.openConnection(); //Opens the connection to the URL so clients can communicate with the resources.
BufferedReader reader = new BufferedReader (new InputStreamReader(category_json.openStream()));
String line;
String diff = "";
while ((line = reader.readLine()) != null)
{
System.out.println(line);
diff=diff+line;
}
urlConnection.disconnect();
reader.close();
JSONArray jsonarray_cat = new JSONArray (diff.substring(diff.indexOf("[{\"pageid\"")));
System.out.println(jsonarray_cat);
//Loop categories
for (int i_url = 0; i_url<jsonarray_cat.length();i_url++) //jSONarray is an array of json objects, we are looping through each object
{
//Get the URL _part (Categorie isn't correct)
int pageid=Integer.parseInt(jsonarray_cat.getJSONObject(i_url).getString("pageid")); //this can be written in a much better way
System.out.println(pageid);
String title=jsonarray_cat.getJSONObject(i_url).getString("title");
System.out.println(title);
File food_year= new File("C:/Users/User/Desktop/Root/"+cat_list.get(i_cat).replaceAll(" ", "_").trim()+".txt");
File food_year2= new File("C:/Users/User/Desktop/Root/c_"+cat_list.get(i_cat).replaceAll(" ", "_").trim()+".txt");
food_year.createNewFile();
food_year2.createNewFile();
BufferedWriter writer = new BufferedWriter (new OutputStreamWriter(new FileOutputStream(food_year, true)));
BufferedWriter writer2 = new BufferedWriter (new OutputStreamWriter(new FileOutputStream(food_year2, true)));
if (title.contains("Category:"))
{
writer2.write(pageid+";"+title);
writer2.newLine();
writer2.flush();
crawlfile(food_year2);
}
else
{
writer.write(pageid+";"+title);
writer.newLine();
writer.flush();
}
}
}
}
}
For starters this might be too big a demand on the wikimedia servers. There are over a million categories (1) and you need to read Wikipedia:Database download - Why not just retrieve data from wikipedia.org at runtime. You would need to throttle your uses to about 1 per second or risk getting blocked. This means it would take about 11 days to get the full tree.
It would be much better to use the standard dumps at https://dumps.wikimedia.org/enwiki/ these will be easier to read and process and you don't need to put a big load on the server.
Still better is to get a Wikimedia Labs account, which allow you to run queries on a replication of the database servers or scripts on the dumps without having to download some very big files.
To get just the economics categories then its easiest to go via https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Economics this has 1242 categories. You may find it easier to use the list of categories there and build the tree from there.
This will be better than a recursive approach. The problem with the wikipedia category system is that it is not really a tree, with plenty of loops. I would not be surprised if you keep following categories you will end up getting the most of wikipedia.