Related
question
How to get each individual replacement result from a Regex replacement?
ex
String regexMatchedWord = matcher.group(); allows me to access the current matched result;
But is there something like String regexMatchedSubstitution = matcher.currentMatchedReplacementResult(); allows me to access the current replacement result?
public class Test {
public static void main(String[] args) {
String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
String regexStrSubstitution = "$2$3$1";
String regexStrMatchFor = "(s)(.)(.)";
Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);
ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();
StringBuilder sb_content_Replaced = new StringBuilder();
while (matcher.find()) {
String regexMatchedWord = matcher.group();
arr_regexMatchedWord.add(regexMatchedWord);
matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);
String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
}
matcher.appendTail(sb_content_Replaced);
System.out.println(sb_content_Replaced); // Sample enstence: naske, nasil, nosw, pisder
System.out.println(arr_regexMatchedWord); // [sen, sna, sna, sno, spi]
System.out.println(arr_regexMatchedSubstitution); // [ens, nas, nas, nos, pis] // << expect
}
}
comments
if Java is not able to do this, is there any other language able to? (Javascript? Python?)
Update: potential solution (workaround)
(as talked in the comment) A simple possible way might be:
convert those $1 into group(1) programmatically,
but you have to watch out for the escape characters like \ that has special meaning...
Another way might be:
use Reflection to somehow get the local variable result in the source code appendExpandedReplacement(replacement, result); of java.util.regex.Matcher.appendReplacement(StringBuilder, String)
public Matcher appendReplacement(StringBuilder sb, String replacement) {
// If no match, return error
if (first < 0)
throw new IllegalStateException("No match available");
StringBuilder result = new StringBuilder();
appendExpandedReplacement(replacement, result);
// Append the intervening text
sb.append(text, lastAppendPosition, first);
// Append the match substitution
sb.append(result);
lastAppendPosition = last;
modCount++;
return this;
}
Or:
Record the end index before the append & count from that index to get the Appended Replacement after the append.
solution (workaround) Java implementation
#logic::
Record the end index before the append & count from that index to get the Appended Replacement after the append.
#code::
public class Test {
public static void main(String[] args) {
String content_SearchOn = "Sample sentence: snake, snail, snow, spider";
String regexStrSubstitution = "$2$3x$1";
String regexStrMatchFor = "(s)(.)(.).";
Matcher matcher = Pattern.compile(regexStrMatchFor).matcher(content_SearchOn);
ArrayList<String> arr_regexMatchedWord = new ArrayList<>();
ArrayList<String> arr_regexMatchedSubstitution = new ArrayList<>();
StringBuilder sb_content_SearchOn = new StringBuilder(content_SearchOn);
StringBuilder sb_content_Replaced = new StringBuilder();
String content_OriPlusCurrAppendSubsti = null;
StringBuilder sb_CurrAppendSubsti_buffer = null;
int indStart_g0_curr = -1;
int indEnd_g0_curr = -1;
int indStart_g0_prev = -1;
int indEnd_g0_prev = -1;
while (matcher.find()) {
// #>>>#
String regexMatchedWord = matcher.group();
indStart_g0_curr = matcher.start();
indEnd_g0_curr = matcher.end();
arr_regexMatchedWord.add(regexMatchedWord);
// #>>>
// #main[business logic]::
// <strike> length_sb_content_Replaced_prev = sb_content_Replaced.length();
// <strike> String regexMatchedSubstitution = sb_content_Replaced.substring(length_sb_content_Replaced_prev);
// #note: it appends both the `the intervening text` + `the match substitution` ...
//need_check,need_confrim_recall if multi call? // matcher.appendReplacement(new StringBuilder(), regexStrSubstitution); // ok its broken, so cant
//~ matcher.appendReplacement(sb_content_Replaced, regexStrSubstitution);
sb_CurrAppendSubsti_buffer = new StringBuilder();
matcher.appendReplacement(sb_CurrAppendSubsti_buffer, regexStrSubstitution + "_$0");
sb_content_Replaced.append(sb_CurrAppendSubsti_buffer);
// #main;;
// #>>>
// #main[get the individual replacement result]::
//~ String regexMatchedSubstitution = null; // << What should I put here -- to get each replacement result?
if (indEnd_g0_prev == -1) {
content_OriPlusCurrAppendSubsti = "";
} else {
content_OriPlusCurrAppendSubsti = sb_content_SearchOn.substring(0, indEnd_g0_prev);
}
content_OriPlusCurrAppendSubsti += sb_CurrAppendSubsti_buffer;
String regexMatchedSubstitution = content_OriPlusCurrAppendSubsti.substring(indStart_g0_curr);
arr_regexMatchedSubstitution.add(regexMatchedSubstitution);
// #main;;
// #>>>#
indStart_g0_prev = indStart_g0_curr;
indEnd_g0_prev = indEnd_g0_curr;
}
matcher.appendTail(sb_content_Replaced);
//
System.out.println(sb_content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
System.out.println(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
System.out.println(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
}
}
solution (workaround) Javascript implementation
#logic::
simply brute force with hardcode string delimiter indicator in regex
replaceAll() -- add brackets around the matched replacement during replacement
matchAll() -- search the matched replacement that was enclosed in the brackets
#code (moved from specific example to a general class [here])::
class RegexUtil {
// https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
// https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
/**
* #param {String} literal_string
* #returns {String}
*/
static escapeRegex(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
/**
* #param {String} string
* #returns {String}
*/
static escapeRegexReplacement(string) {
return string.replace(/\$/g, '$$$$');
}
/**
* #param {String} content_SearchOn
* #param {RegExp} regexMatchFor
* #param {String} regexStrSubstitution
* #param {String} regexFlag
* #returns {String[]}
*/
static get_RegexMatchedReplacement(content_SearchOn, regexMatchFor, regexStrSubstitution) {
const arr_regexMatchedSubstitution = [];
let time_now;
let delim_regexMatchedSub_left;
let delim_regexMatchedSub_right;
/** #type {IterableIterator<RegExpMatchArray>} */ let itr;
let i = 0;
do {
i++;
if (i === 50) {
throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
}
time_now = Date.now();
delim_regexMatchedSub_left = '#drmsL' + time_now + ';';
delim_regexMatchedSub_right = '#drmsR' + time_now + ';';
itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
} while (itr.next().done !== true);
const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(regexMatchFor, RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
for (const matcher_curr of itr) {
arr_regexMatchedSubstitution.push(matcher_curr[1]);
}
return arr_regexMatchedSubstitution;
}
}
#code (moved from specific example [here] to a general class)::
class RegexUtil {
// https://stackoverflow.com/questions/3561493/is-there-a-regexp-escape-function-in-javascript
// https://stackoverflow.com/questions/3446170/escape-string-for-use-in-javascript-regex
/**
* #param {String} literal_string
* #returns {String}
*/
static escapeRegex(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
/**
* #param {String} string
* #returns {String}
*/
static escapeRegexReplacement(string) {
return string.replace(/\$/g, '$$$$');
}
}
//think aga, to use a generic way to escape special meaning delimiter in regex ...
const content_SearchOn = 'Sample sentence: snake, snail, snow, spider';
let regexStrSubstitution = '$2$3x$1';
const regexStrMatchFor = '(s)(.)(.).';
const regexFlag = 'gmd';
regexStrSubstitution += '_$&';
const arr_regexMatchedWord = [];
const arr_regexMatchedSubstitution = [];
let time_now;
let delim_regexMatchedSub_left;
let delim_regexMatchedSub_right;
/** #type {IterableIterator<RegExpMatchArray>} */ let itr;
let i = 0;
do {
i++;
if (i === 50) {
throw new Error('Many loops tried, Unable to brute force with hardcode string indicator in regex. (The chance of this happening is nearly impossible.)');
}
time_now = Date.now();
delim_regexMatchedSub_left = '#drmsL' + time_now + ';';
delim_regexMatchedSub_right = '#drmsR' + time_now + ';';
itr = content_SearchOn.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '|' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'g'));
} while (itr.next().done !== true);
const content_Replaced_WithDelimiter = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_left) + regexStrSubstitution + RegexUtil.escapeRegexReplacement(delim_regexMatchedSub_right));
itr = content_Replaced_WithDelimiter.matchAll(new RegExp(RegexUtil.escapeRegex(delim_regexMatchedSub_left) + '(.*?)' + RegexUtil.escapeRegex(delim_regexMatchedSub_right), 'gs')); // need flag s
for (const matcher_curr of itr) {
arr_regexMatchedSubstitution.push(matcher_curr[1]);
}
itr = content_SearchOn.matchAll(new RegExp(regexStrMatchFor, regexFlag));
for (const matcher_curr of itr) {
arr_regexMatchedWord.push(matcher_curr[0]);
}
const content_Replaced = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), regexStrSubstitution);
console.log(content_Replaced); // Sample enxs_sentence: naxs_snake, naxs_snail, noxs_snow, pixs_spider
console.log(arr_regexMatchedWord); // [sent, snak, snai, snow, spid]
console.log(arr_regexMatchedSubstitution); // [enxs_sent, naxs_snak, naxs_snai, noxs_snow, pixs_spid] // << expect
comment (minor)
The reason to brute force it with hardcode string indicator in regex is that,
Javascript is even worse at:
not providing the appendReplacement()
replacer callback function does not support those $1
Note: The above-mentioned special replacement patterns do not apply for strings returned from the replacer function.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#specifying_a_function_as_the_replacement
replacer callback function does not support those $1
Which makes this following idea useless (could have work) (complex & low performance) ::
for (const matcher_curr of itr_matcher) {
ind_ReplaceOnlyCurrOne++;
let ind_Match = -1;
function replace_OnlyOneWord_c_for_get_regexMatchedSubstitution(...args) {
ind_Match++;
/** #type {String} */ const g0 = args[0];
if (ind_Match === ind_ReplaceOnlyCurrOne) {
// prettier-ignore
let arg_last = args.at(-1); let ind_g0; let content_SearchOn; let groups;
// prettier-ignore
if (typeof arg_last === 'string') { content_SearchOn = arg_last; ind_g0 = args.at(-2); } else { groups = arg_last; content_SearchOn = args.at(-2); ind_g0 = args.at(-3); }
arr_regexMatchedWord.push(g0);
indStart_g0 = ind_g0;
indEnd_g0 = ind_g0 + g0.length;
return replacer_main(args);
} else {
return RegexUtil.escapeRegexReplacement(g0);
}
}
const content_ReplacedOnlyCurrOne__P1_Pm_P2 = content_SearchOn.replaceAll(new RegExp(regexStrMatchFor, regexFlag), replace_OnlyOneWord_c_for_get_regexMatchedSubstitution);
const Pm_P2 = content_ReplacedOnlyCurrOne__P1_Pm_P2.slice(indStart_g0);
const P2 = content_SearchOn.slice(indEnd_g0);
const regexMatchedSubstitution__Pm = Pm_P2.replaceAll(new RegExp(RegexUtil.escapeRegexp(P2)+'$', 'g'), '');
arr_regexMatchedSubstitution.push(regexMatchedSubstitution__Pm);
}
You can use replaceAll(Function<MatchResult, String> replacer) on a Matcher to "intercept" the replacement:
String input = "Sample sentence: snake, snail, snow, spider";
List<String> matches = new ArrayList<>();
String result = Pattern.compile("(s)(.)(.)").matcher(input)
.replaceAll(mr -> {
matches.add(mr.group());
return mr.group(2) + mr.group(3) + mr.group(1);
});
System.out.println(result);
System.out.println(matches);
Output:
Sample enstence: naske, nasil, nosw, pisder
[sen, sna, sna, sno, spi]
I'm developing an application that should verify signatures of pdf files. The application should detect full history of updates done on the file content before each signature is applied.
For example:
Signer 1 signed the plain pdf file
Signer 2 added comment to the signed file, then signed it
How can application detect that Signer 2 added a comment before his signature.
I have tried to use itext and pdfbox
As already explained in a comment, neither iText nor PDFBox bring along a high-level API telling you what changed in an incremental update in terms of UI objects (comments, text content, ...).
You can use them to render the different revisions of the PDF as bitmaps and compare those images.
Or you can use them to tell you the changes in terms of low level COS objects (dictionaries, arrays, numbers, strings, ...).
But analyzing the changes in those images or low level objects and determining their meaning in terms of UI objects, that e.g. a comment and only a comment has been added, is highly non-trivial.
In response you asked
Can you explain more, how can I detect changes in low level COS objects.
What to Compare And What Changes to Consider
First of all you have to be clear about what document states you can compare to detect changes.
The PDF format allows to append changes to a PDF in so called incremental updates. This allows changes to signed documents without cryptographically breaking those signatures as the original signed bytes are left as is:
There can be more incremental updates in-between, though, which are not signed; e.g. the "Changes for version 2" might include multiple incremental updates.
One might consider comparing the revisions created by arbitrary incremental updates. The problem here is, though, that you cannot identify the person who applied an incremental update without signature.
Thus, it usually makes more sense to compare the signed revisions only and to hold each signer responsible for all changes since the previous signed revision. The only exception here is the whole file which as the current version of the PDF is of special interest even if it there is no signature covering all of it.
Next you have to decide what you consider a change. In particular:
Is every object override in an incremental update a change? Even those that override the original object with an identical copy?
What about changes that make a direct object indirect (or vice versa) but keep all contents and references intact?
What about addition of new objects that are not referred to from anywhere in the standard structure?
What about addition of objects that are not referenced from the cross reference streams or tables?
What about addition of data that's not following PDF syntax at all?
If you are indeed interested in such changes, too, existing PDF libraries out-of-the-box usually don't provide you the means to determine them; you most likely will at least have to change their code for traversing the chain of cross reference tables/streams or even analyze the file bytes in the update directly.
If you are not interested in such changes, though, there usually is no need to change or replace library routines.
As the enumerated and similar changes make no difference when the PDF is processed by specification conform PDF processors, one can usually ignore such changes.
If this is your position, too, the following example tool might give you a starting point.
An Example Tool Based on iText 7
With the limitations explained above you can compare signed revisions of a PDF using iText 7 without changes to the library by loading the revisions to compare into separate PdfDocument instances and recursively comparing the PDF objects starting with the trailer.
I once implemented this as a small helper tool for personal use (so it is not completely finished yet, more work-in-progress). First there is the base class that allows comparing two arbitrary documents:
public class PdfCompare {
public static void main(String[] args) throws IOException {
System.out.printf("Comparing:\n* %s\n* %s\n", args[0], args[1]);
try ( PdfDocument pdfDocument1 = new PdfDocument(new PdfReader(args[0]));
PdfDocument pdfDocument2 = new PdfDocument(new PdfReader(args[1])) ) {
PdfCompare pdfCompare = new PdfCompare(pdfDocument1, pdfDocument2);
pdfCompare.compare();
List<Difference> differences = pdfCompare.getDifferences();
if (differences == null || differences.isEmpty()) {
System.out.println("No differences found.");
} else {
System.out.printf("%d differences found:\n", differences.size());
for (Difference difference : pdfCompare.getDifferences()) {
for (String element : difference.getPath()) {
System.out.print(element);
}
System.out.printf(" - %s\n", difference.getDescription());
}
}
}
}
public interface Difference {
List<String> getPath();
String getDescription();
}
public PdfCompare(PdfDocument pdfDocument1, PdfDocument pdfDocument2) {
trailer1 = pdfDocument1.getTrailer();
trailer2 = pdfDocument2.getTrailer();
}
public void compare() {
LOGGER.info("Starting comparison");
try {
compared.clear();
differences.clear();
LOGGER.info("START COMPARE");
compare(trailer1, trailer2, Collections.singletonList("trailer"));
LOGGER.info("START SHORTEN PATHS");
shortenPaths();
} finally {
LOGGER.info("Finished comparison and shortening");
}
}
public List<Difference> getDifferences() {
return differences;
}
class DifferenceImplSimple implements Difference {
DifferenceImplSimple(PdfObject object1, PdfObject object2, List<String> path, String description) {
this.pair = Pair.of(object1, object2);
this.path = path;
this.description = description;
}
#Override
public List<String> getPath() {
List<String> byPair = getShortestPath(pair);
return byPair != null ? byPair : shorten(path);
}
#Override public String getDescription() { return description; }
final Pair<PdfObject, PdfObject> pair;
final List<String> path;
final String description;
}
void compare(PdfObject object1, PdfObject object2, List<String> path) {
LOGGER.debug("Comparing objects at {}.", path);
if (object1 == null && object2 == null)
{
LOGGER.debug("Both objects are null at {}.", path);
return;
}
if (object1 == null) {
differences.add(new DifferenceImplSimple(object1, object2, path, "Missing in document 1"));
LOGGER.info("Object in document 1 is missing at {}.", path);
return;
}
if (object2 == null) {
differences.add(new DifferenceImplSimple(object1, object2, path, "Missing in document 2"));
LOGGER.info("Object in document 2 is missing at {}.", path);
return;
}
if (object1.getType() != object2.getType()) {
differences.add(new DifferenceImplSimple(object1, object2, path,
String.format("Type difference, %s in document 1 and %s in document 2",
getTypeName(object1.getType()), getTypeName(object2.getType()))));
LOGGER.info("Objects have different types at {}, {} and {}.", path, getTypeName(object1.getType()), getTypeName(object2.getType()));
return;
}
switch (object1.getType()) {
case PdfObject.ARRAY:
compareContents((PdfArray) object1, (PdfArray) object2, path);
break;
case PdfObject.DICTIONARY:
compareContents((PdfDictionary) object1, (PdfDictionary) object2, path);
break;
case PdfObject.STREAM:
compareContents((PdfStream)object1, (PdfStream)object2, path);
break;
case PdfObject.BOOLEAN:
case PdfObject.INDIRECT_REFERENCE:
case PdfObject.LITERAL:
case PdfObject.NAME:
case PdfObject.NULL:
case PdfObject.NUMBER:
case PdfObject.STRING:
compareContentsSimple(object1, object2, path);
break;
default:
differences.add(new DifferenceImplSimple(object1, object2, path, "Unknown object type " + object1.getType() + "; cannot compare"));
LOGGER.warn("Unknown object type at {}, {}.", path, object1.getType());
break;
}
}
void compareContents(PdfArray array1, PdfArray array2, List<String> path) {
int count1 = array1.size();
int count2 = array2.size();
if (count1 < count2) {
differences.add(new DifferenceImplSimple(array1, array2, path, "Document 1 misses " + (count2-count1) + " array entries"));
LOGGER.info("Array in document 1 is missing {} entries at {} for {}.", (count2-count1), path);
}
if (count1 > count2) {
differences.add(new DifferenceImplSimple(array1, array2, path, "Document 2 misses " + (count1-count2) + " array entries"));
LOGGER.info("Array in document 2 is missing {} entries at {} for {}.", (count1-count2), path);
}
if (alreadyCompared(array1, array2, path)) {
return;
}
int count = Math.min(count1, count2);
for (int i = 0; i < count; i++) {
compare(array1.get(i), array2.get(i), join(path, String.format("[%d]", i)));
}
}
void compareContents(PdfDictionary dictionary1, PdfDictionary dictionary2, List<String> path) {
List<PdfName> missing1 = new ArrayList<PdfName>(dictionary2.keySet());
missing1.removeAll(dictionary1.keySet());
if (!missing1.isEmpty()) {
differences.add(new DifferenceImplSimple(dictionary1, dictionary2, path, "Document 1 misses dictionary entries for " + missing1));
LOGGER.info("Dictionary in document 1 is missing entries at {} for {}.", path, missing1);
}
List<PdfName> missing2 = new ArrayList<PdfName>(dictionary1.keySet());
missing2.removeAll(dictionary2.keySet());
if (!missing2.isEmpty()) {
differences.add(new DifferenceImplSimple(dictionary1, dictionary2, path, "Document 2 misses dictionary entries for " + missing2));
LOGGER.info("Dictionary in document 2 is missing entries at {} for {}.", path, missing2);
}
if (alreadyCompared(dictionary1, dictionary2, path)) {
return;
}
List<PdfName> common = new ArrayList<PdfName>(dictionary1.keySet());
common.retainAll(dictionary2.keySet());
for (PdfName name : common) {
compare(dictionary1.get(name), dictionary2.get(name), join(path, name.toString()));
}
}
void compareContents(PdfStream stream1, PdfStream stream2, List<String> path) {
compareContents((PdfDictionary)stream1, (PdfDictionary)stream2, path);
byte[] bytes1 = stream1.getBytes();
byte[] bytes2 = stream2.getBytes();
if (!Arrays.equals(bytes1, bytes2)) {
differences.add(new DifferenceImplSimple(stream1, stream2, path, "Stream contents differ"));
LOGGER.info("Stream contents differ at {}.", path);
}
}
void compareContentsSimple(PdfObject object1, PdfObject object2, List<String> path) {
// vvv--- work-around for DEVSIX-4931, likely to be fixed in 7.1.15
if (object1 instanceof PdfNumber)
((PdfNumber)object1).getValue();
if (object2 instanceof PdfNumber)
((PdfNumber)object2).getValue();
// ^^^--- work-around for DEVSIX-4931, likely to be fixed in 7.1.15
if (!object1.equals(object2)) {
if (object1 instanceof PdfString) {
String string1 = object1.toString();
if (string1.length() > 40)
string1 = string1.substring(0, 40) + '\u22EF';
string1 = sanitize(string1);
String string2 = object2.toString();
if (string2.length() > 40)
string2 = string2.substring(0, 40) + '\u22EF';
string2 = sanitize(string2);
differences.add(new DifferenceImplSimple(object1, object2, path, String.format("String values differ, '%s' and '%s'", string1, string2)));
LOGGER.info("String values differ at {}, '{}' and '{}'.", path, string1, string2);
} else {
differences.add(new DifferenceImplSimple(object1, object2, path, String.format("Object values differ, '%s' and '%s'", object1, object2)));
LOGGER.info("Object values differ at {}, '{}' and '{}'.", path, object1, object2);
}
}
}
String sanitize(CharSequence string) {
char[] sanitized = new char[string.length()];
for (int i = 0; i < sanitized.length; i++) {
char c = string.charAt(i);
if (c >= 0 && c < ' ')
c = '\uFFFD';
sanitized[i] = c;
}
return new String(sanitized);
}
String getTypeName(byte type) {
switch (type) {
case PdfObject.ARRAY: return "ARRAY";
case PdfObject.BOOLEAN: return "BOOLEAN";
case PdfObject.DICTIONARY: return "DICTIONARY";
case PdfObject.LITERAL: return "LITERAL";
case PdfObject.INDIRECT_REFERENCE: return "REFERENCE";
case PdfObject.NAME: return "NAME";
case PdfObject.NULL: return "NULL";
case PdfObject.NUMBER: return "NUMBER";
case PdfObject.STREAM: return "STREAM";
case PdfObject.STRING: return "STRING";
default:
return "UNKNOWN";
}
}
List<String> join(List<String> path, String element) {
String[] array = path.toArray(new String[path.size() + 1]);
array[array.length-1] = element;
return Arrays.asList(array);
}
boolean alreadyCompared(PdfObject object1, PdfObject object2, List<String> path) {
Pair<PdfObject, PdfObject> pair = Pair.of(object1, object2);
if (compared.containsKey(pair)) {
//LOGGER.debug("Objects already compared at {}, previously at {}.", path, compared.get(pair));
Set<List<String>> paths = compared.get(pair);
boolean alreadyPresent = false;
// List<List<String>> toRemove = new ArrayList<>();
// for (List<String> formerPath : paths) {
// for (int i = 0; ; i++) {
// if (i == path.size()) {
// toRemove.add(formerPath);
// System.out.print('.');
// break;
// }
// if (i == formerPath.size()) {
// alreadyPresent = true;
// System.out.print(':');
// break;
// }
// if (!path.get(i).equals(formerPath.get(i)))
// break;
// }
// }
// paths.removeAll(toRemove);
if (!alreadyPresent)
paths.add(path);
return true;
}
compared.put(pair, new HashSet<>(Collections.singleton(path)));
return false;
}
List<String> getShortestPath(Pair<PdfObject, PdfObject> pair) {
Set<List<String>> paths = compared.get(pair);
//return (paths == null) ? null : Collections.min(paths, pathComparator);
return (paths == null || paths.isEmpty()) ? null : shortened.get(paths.stream().findFirst().get());
}
void shortenPaths() {
List<Map<List<String>, SortedSet<List<String>>>> data = new ArrayList<>();
for (Set<List<String>> set : compared.values()) {
SortedSet<List<String>> sortedSet = new TreeSet<List<String>>(pathComparator);
sortedSet.addAll(set);
for (List<String> path : sortedSet) {
while (path.size() >= data.size()) {
data.add(new HashMap<>());
}
SortedSet<List<String>> former = data.get(path.size()).put(path, sortedSet);
if (former != null) {
LOGGER.error("Path not well-defined for {}", path);
}
}
}
for (int pathSize = 3; pathSize < data.size(); pathSize++) {
for (Map.Entry<List<String>, SortedSet<List<String>>> pathEntry : data.get(pathSize).entrySet()) {
List<String> path = pathEntry.getKey();
SortedSet<List<String>> equivalents = pathEntry.getValue();
for (int subpathSize = 2; subpathSize < pathSize; subpathSize++) {
List<String> subpath = path.subList(0, subpathSize);
List<String> remainder = path.subList(subpathSize, pathSize);
SortedSet<List<String>> subequivalents = data.get(subpathSize).get(subpath);
if (subequivalents != null && subequivalents.size() > 1) {
List<String> subequivalent = subequivalents.first();
if (subequivalent.size() < subpathSize) {
List<String> replacement = join(subequivalent, remainder);
if (equivalents.add(replacement)) {
data.get(replacement.size()).put(replacement, equivalents);
}
}
}
}
}
}
shortened.clear();
for (Map<List<String>, SortedSet<List<String>>> singleLengthData : data) {
for (Map.Entry<List<String>, SortedSet<List<String>>> entry : singleLengthData.entrySet()) {
List<String> path = entry.getKey();
List<String> shortenedPath = entry.getValue().first();
shortened.put(path, shortenedPath);
}
}
}
List<String> join(List<String> path, List<String> elements) {
String[] array = path.toArray(new String[path.size() + elements.size()]);
for (int i = 0; i < elements.size(); i++) {
array[path.size() + i] = elements.get(i);
}
return Arrays.asList(array);
}
List<String> shorten(List<String> path) {
List<String> shortPath = path;
for (int subpathSize = path.size(); subpathSize > 2; subpathSize--) {
List<String> subpath = path.subList(0, subpathSize);
List<String> shortSubpath = shortened.get(subpath);
if (shortSubpath != null && shortSubpath.size() < subpathSize) {
List<String> remainder = path.subList(subpathSize, path.size());
List<String> replacement = join(shortSubpath, remainder);
if (replacement.size() < shortPath.size())
shortPath = replacement;
}
}
return shortPath;
}
final static Logger LOGGER = LoggerFactory.getLogger(PdfCompare.class);
final PdfDictionary trailer1;
final PdfDictionary trailer2;
final Map<Pair<PdfObject, PdfObject>, Set<List<String>>> compared = new HashMap<>();
final List<Difference> differences = new ArrayList<>();
final Map<List<String>, List<String>> shortened = new HashMap<>();
final static Comparator<List<String>> pathComparator = new Comparator<List<String>>() {
#Override
public int compare(List<String> o1, List<String> o2) {
int compare = Integer.compare(o1.size(), o2.size());
if (compare != 0)
return compare;
for (int i = 0; i < o1.size(); i++) {
compare = o1.get(i).compareTo(o2.get(i));
if (compare != 0)
return compare;
}
return 0;
}
};
}
(PdfCompare.java)
The tool to use this code for revision comparison is a subclass thereof:
public class PdfRevisionCompare extends PdfCompare {
public static void main(String[] args) throws IOException {
for (String arg : args) {
System.out.printf("\nComparing revisions of: %s\n***********************\n", args[0]);
try (PdfDocument pdfDocument = new PdfDocument(new PdfReader(arg))) {
SignatureUtil signatureUtil = new SignatureUtil(pdfDocument);
List<String> signatureNames = signatureUtil.getSignatureNames();
if (signatureNames.isEmpty()) {
System.out.println("No signed revisions detected. (no AcroForm)");
continue;
}
String previousRevision = signatureNames.get(0);
PdfDocument previousDocument = new PdfDocument(new PdfReader(signatureUtil.extractRevision(previousRevision)));
System.out.printf("* Initial signed revision: %s\n", previousRevision);
for (int i = 1; i < signatureNames.size(); i++) {
String currentRevision = signatureNames.get(i);
PdfDocument currentDocument = new PdfDocument(new PdfReader(signatureUtil.extractRevision(currentRevision)));
showDifferences(previousDocument, currentDocument);
System.out.printf("* Next signed revision (%d): %s\n", i+1, currentRevision);
previousDocument.close();
previousDocument = currentDocument;
previousRevision = currentRevision;
}
if (signatureUtil.signatureCoversWholeDocument(previousRevision)) {
System.out.println("No unsigned updates.");
} else {
showDifferences(previousDocument, pdfDocument);
System.out.println("* Final unsigned revision");
}
previousDocument.close();
}
}
}
static void showDifferences(PdfDocument previousDocument, PdfDocument currentDocument) {
PdfRevisionCompare pdfRevisionCompare = new PdfRevisionCompare(previousDocument, currentDocument);
pdfRevisionCompare.compare();
List<Difference> differences = pdfRevisionCompare.getDifferences();
if (differences == null || differences.isEmpty()) {
System.out.println("No differences found.");
} else {
System.out.printf("%d differences found:\n", differences.size());
for (Difference difference : differences) {
for (String element : difference.getPath()) {
System.out.print(element);
}
System.out.printf(" - %s\n", difference.getDescription());
}
}
}
public PdfRevisionCompare(PdfDocument pdfDocument1, PdfDocument pdfDocument2) {
super(pdfDocument1, pdfDocument2);
}
}
(PdfRevisionCompare.java)
I am having some trouble understanding how to properly add to an array. What I am trying to accomplish is adding the results from the a method into an array and then run through the array.
Here is an example of what I need, or assume I need:
array{"websiteaddress","websiteaddress","websiteaddress","websiteaddress","websiteaddress"}
but instead I'm getting:
websiteaddress
websiteaddress
websiteaddress
websiteaddress
websiteaddress
websiteaddress
Here is the code I am working with:
private static final String webSiteURL = "https://websitename.com/";
//The path of the folder that you want to save the images to
private static final String folderPath = "C://path/";
private static final ArrayList<String> webPages = new ArrayList<>();
public static String[] thisIsAStringArray = {"https://websitename.com/"};
public static String[] tempArray = new String[ thisIsAStringArray.length + 1 ];
/**
*
* Method description:
* Date: Mar 17, 2018
* #param args
* #return void
*/
public static void main(String[] args) {
String path = folderPath + getPageTitle(webSiteURL).replaceAll(" ", "-");
pageLinks(webSiteURL);
System.out.println(thisIsAStringArray);
for(String web : thisIsAStringArray)
{
for(int n = 0; n < thisIsAStringArray.length - 1; n++)
{
System.out.println(thisIsAStringArray[n]);
getPageTitle(web);
pageLinks(web);
creatDirectory(folderPath, getPageTitle(web));
getsImagesAndSaves(path, web);
n++;
}
}
}
/**
*
* Method description: Get all the links on the page and put them into an array
* Date: Mar 16, 2018
* #param src
* #return void
*/
public static void pageLinks(String src)
{
try
{
URL url = new URL(src);
Document doc = Jsoup.parse(url, 3*1000);
Elements links = doc.select("a[href]"); // a with href
for (Element link : links)
{
System.out.println(link.attr("abs:href"));
String noHref = link.attr("abs:href");
for(int i = 0; i < thisIsAStringArray.length; i++)
{
tempArray[i] = thisIsAStringArray[i];
}
//thisIsAStringArray[i] = noHref;
tempArray[thisIsAStringArray.length] = noHref;
}
thisIsAStringArray = tempArray;
}
catch(Exception error)
{
System.out.println(error + " Something went wrong getting the links!");
}
}
}
Any help would be greatly appreciated and thank you in advance!
You have 2 arrays: thisIsAStringArray with size 1 and tempArray with size 2. Their size is fixed and cannot be changed! Now you have a loop:
for (Element link : links)
{
...
for(int i = 0; i < thisIsAStringArray.length; i++)
{
tempArray[i] = thisIsAStringArray[i];
}
}
which reads - for each link you've found, i loops from zero to one (which means that inside the inner loop i will have only the value 0) and than adds the link to the first place (with index 0).
You cannot change the size of an array in runtime. If you cannot tell ahead how many items you will have, you must use a List. Try something like this:
ArrayList<String> myList = new ArrayList<>();
for (Element link : links)
myList.add(link);
I was wondering how I could implement an ArgumentCompleter such that if I complete a full and valid command, then it would begin tab completing for a new command.
I would have assumed it could be constructed doing something like this:
final ConsoleReader consoleReader = new ConsoleReader()
final ArgumentCompleter cyclicalArgument = new ArgumentCompleter();
cyclicalArgument.getCompleters().addAll(Arrays.asList(
new StringsCompleter("foo"),
new StringsCompleter("bar"),
cyclicalArgument));
consoleReader.addCompleter(cyclicalArgument);
consoleReader.readLine();
However right now this stops working after tab completeing the first foo bar
Is anyone familiar enough with the library to tell me how I would go about implementing this? Or is there a known way to do this that I am missing? Also this is using JLine2.
That was quite a task :-)
It is handled by the completer you are using. The complete() method of the completer has to use for the search only what comes after the last blank.
If you look for example at the FileNameCompleter of the library: this is not done at all, so you will find no completion, because the completer searches for <input1> <input2> and not only for <input2> :-)
You will have to do your own implementation of a completer that is able to find input2.
Additionally the CompletionHandler has to append what you found to what you already typed.
Here is a basic implementation changing the default FileNameCompleter:
protected int matchFiles(final String buffer, final String translated, final File[] files,
final List<CharSequence> candidates) {
// THIS IS NEW
String[] allWords = translated.split(" ");
String lastWord = allWords[allWords.length - 1];
// the lastWord is used when searching the files now
// ---
if (files == null) {
return -1;
}
int matches = 0;
// first pass: just count the matches
for (File file : files) {
if (file.getAbsolutePath().startsWith(lastWord)) {
matches++;
}
}
for (File file : files) {
if (file.getAbsolutePath().startsWith(lastWord)) {
CharSequence name = file.getName() + (matches == 1 && file.isDirectory() ? this.separator() : " ");
candidates.add(this.render(file, name).toString());
}
}
final int index = buffer.lastIndexOf(this.separator());
return index + this.separator().length();
}
And here the complete()-Method of the CompletionHandler changing the default CandidateListCompletionHandler:
#Override
public boolean complete(final ConsoleReader reader, final List<CharSequence> candidates, final int pos)
throws IOException {
CursorBuffer buf = reader.getCursorBuffer();
// THIS IS NEW
String[] allWords = buf.toString().split(" ");
String firstWords = "";
if (allWords.length > 1) {
for (int i = 0; i < allWords.length - 1; i++) {
firstWords += allWords[i] + " ";
}
}
//-----
// if there is only one completion, then fill in the buffer
if (candidates.size() == 1) {
String value = Ansi.stripAnsi(candidates.get(0).toString());
if (buf.cursor == buf.buffer.length() && this.printSpaceAfterFullCompletion && !value.endsWith(" ")) {
value += " ";
}
// fail if the only candidate is the same as the current buffer
if (value.equals(buf.toString())) {
return false;
}
CandidateListCompletionHandler.setBuffer(reader, firstWords + " " + value, pos);
return true;
} else if (candidates.size() > 1) {
String value = this.getUnambiguousCompletions(candidates);
CandidateListCompletionHandler.setBuffer(reader, value, pos);
}
CandidateListCompletionHandler.printCandidates(reader, candidates);
// redraw the current console buffer
reader.drawLine();
return true;
}
Given two absolute paths, e.g.
/var/data/stuff/xyz.dat
/var/data
How can one create a relative path that uses the second path as its base? In the example above, the result should be: ./stuff/xyz.dat
It's a little roundabout, but why not use URI? It has a relativize method which does all the necessary checks for you.
String path = "/var/data/stuff/xyz.dat";
String base = "/var/data";
String relative = new File(base).toURI().relativize(new File(path).toURI()).getPath();
// relative == "stuff/xyz.dat"
Please note that for file path there's java.nio.file.Path#relativize since Java 1.7, as pointed out by #Jirka Meluzin in the other answer.
Since Java 7 you can use the relativize method:
import java.nio.file.Path;
import java.nio.file.Paths;
public class Test {
public static void main(String[] args) {
Path pathAbsolute = Paths.get("/var/data/stuff/xyz.dat");
Path pathBase = Paths.get("/var/data");
Path pathRelative = pathBase.relativize(pathAbsolute);
System.out.println(pathRelative);
}
}
Output:
stuff/xyz.dat
At the time of writing (June 2010), this was the only solution that passed my test cases. I can't guarantee that this solution is bug-free, but it does pass the included test cases. The method and tests I've written depend on the FilenameUtils class from Apache commons IO.
The solution was tested with Java 1.4. If you're using Java 1.5 (or higher) you should consider replacing StringBuffer with StringBuilder (if you're still using Java 1.4 you should consider a change of employer instead).
import java.io.File;
import java.util.regex.Pattern;
import org.apache.commons.io.FilenameUtils;
public class ResourceUtils {
/**
* Get the relative path from one file to another, specifying the directory separator.
* If one of the provided resources does not exist, it is assumed to be a file unless it ends with '/' or
* '\'.
*
* #param targetPath targetPath is calculated to this file
* #param basePath basePath is calculated from this file
* #param pathSeparator directory separator. The platform default is not assumed so that we can test Unix behaviour when running on Windows (for example)
* #return
*/
public static String getRelativePath(String targetPath, String basePath, String pathSeparator) {
// Normalize the paths
String normalizedTargetPath = FilenameUtils.normalizeNoEndSeparator(targetPath);
String normalizedBasePath = FilenameUtils.normalizeNoEndSeparator(basePath);
// Undo the changes to the separators made by normalization
if (pathSeparator.equals("/")) {
normalizedTargetPath = FilenameUtils.separatorsToUnix(normalizedTargetPath);
normalizedBasePath = FilenameUtils.separatorsToUnix(normalizedBasePath);
} else if (pathSeparator.equals("\\")) {
normalizedTargetPath = FilenameUtils.separatorsToWindows(normalizedTargetPath);
normalizedBasePath = FilenameUtils.separatorsToWindows(normalizedBasePath);
} else {
throw new IllegalArgumentException("Unrecognised dir separator '" + pathSeparator + "'");
}
String[] base = normalizedBasePath.split(Pattern.quote(pathSeparator));
String[] target = normalizedTargetPath.split(Pattern.quote(pathSeparator));
// First get all the common elements. Store them as a string,
// and also count how many of them there are.
StringBuffer common = new StringBuffer();
int commonIndex = 0;
while (commonIndex < target.length && commonIndex < base.length
&& target[commonIndex].equals(base[commonIndex])) {
common.append(target[commonIndex] + pathSeparator);
commonIndex++;
}
if (commonIndex == 0) {
// No single common path element. This most
// likely indicates differing drive letters, like C: and D:.
// These paths cannot be relativized.
throw new PathResolutionException("No common path element found for '" + normalizedTargetPath + "' and '" + normalizedBasePath
+ "'");
}
// The number of directories we have to backtrack depends on whether the base is a file or a dir
// For example, the relative path from
//
// /foo/bar/baz/gg/ff to /foo/bar/baz
//
// ".." if ff is a file
// "../.." if ff is a directory
//
// The following is a heuristic to figure out if the base refers to a file or dir. It's not perfect, because
// the resource referred to by this path may not actually exist, but it's the best I can do
boolean baseIsFile = true;
File baseResource = new File(normalizedBasePath);
if (baseResource.exists()) {
baseIsFile = baseResource.isFile();
} else if (basePath.endsWith(pathSeparator)) {
baseIsFile = false;
}
StringBuffer relative = new StringBuffer();
if (base.length != commonIndex) {
int numDirsUp = baseIsFile ? base.length - commonIndex - 1 : base.length - commonIndex;
for (int i = 0; i < numDirsUp; i++) {
relative.append(".." + pathSeparator);
}
}
relative.append(normalizedTargetPath.substring(common.length()));
return relative.toString();
}
static class PathResolutionException extends RuntimeException {
PathResolutionException(String msg) {
super(msg);
}
}
}
The test cases that this passes are
public void testGetRelativePathsUnix() {
assertEquals("stuff/xyz.dat", ResourceUtils.getRelativePath("/var/data/stuff/xyz.dat", "/var/data/", "/"));
assertEquals("../../b/c", ResourceUtils.getRelativePath("/a/b/c", "/a/x/y/", "/"));
assertEquals("../../b/c", ResourceUtils.getRelativePath("/m/n/o/a/b/c", "/m/n/o/a/x/y/", "/"));
}
public void testGetRelativePathFileToFile() {
String target = "C:\\Windows\\Boot\\Fonts\\chs_boot.ttf";
String base = "C:\\Windows\\Speech\\Common\\sapisvr.exe";
String relPath = ResourceUtils.getRelativePath(target, base, "\\");
assertEquals("..\\..\\Boot\\Fonts\\chs_boot.ttf", relPath);
}
public void testGetRelativePathDirectoryToFile() {
String target = "C:\\Windows\\Boot\\Fonts\\chs_boot.ttf";
String base = "C:\\Windows\\Speech\\Common\\";
String relPath = ResourceUtils.getRelativePath(target, base, "\\");
assertEquals("..\\..\\Boot\\Fonts\\chs_boot.ttf", relPath);
}
public void testGetRelativePathFileToDirectory() {
String target = "C:\\Windows\\Boot\\Fonts";
String base = "C:\\Windows\\Speech\\Common\\foo.txt";
String relPath = ResourceUtils.getRelativePath(target, base, "\\");
assertEquals("..\\..\\Boot\\Fonts", relPath);
}
public void testGetRelativePathDirectoryToDirectory() {
String target = "C:\\Windows\\Boot\\";
String base = "C:\\Windows\\Speech\\Common\\";
String expected = "..\\..\\Boot";
String relPath = ResourceUtils.getRelativePath(target, base, "\\");
assertEquals(expected, relPath);
}
public void testGetRelativePathDifferentDriveLetters() {
String target = "D:\\sources\\recovery\\RecEnv.exe";
String base = "C:\\Java\\workspace\\AcceptanceTests\\Standard test data\\geo\\";
try {
ResourceUtils.getRelativePath(target, base, "\\");
fail();
} catch (PathResolutionException ex) {
// expected exception
}
}
When using java.net.URI.relativize you should be aware of Java bug:
JDK-6226081 (URI should be able to relativize paths with partial roots)
At the moment, the relativize() method of URI will only relativize URIs when one is a prefix of the other.
Which essentially means java.net.URI.relativize will not create ".."'s for you.
In Java 7 and later you can simply use (and in contrast to URI, it is bug free):
Path#relativize(Path)
The bug referred to in another answer is addressed by URIUtils in Apache HttpComponents
public static URI resolve(URI baseURI,
String reference)
Resolves a URI reference against a
base URI. Work-around for bug in
java.net.URI ()
If you know the second string is part of the first:
String s1 = "/var/data/stuff/xyz.dat";
String s2 = "/var/data";
String s3 = s1.substring(s2.length());
or if you really want the period at the beginning as in your example:
String s3 = ".".concat(s1.substring(s2.length()));
Recursion produces a smaller solution. This throws an exception if the result is impossible (e.g. different Windows disk) or impractical (root is only common directory.)
/**
* Computes the path for a file relative to a given base, or fails if the only shared
* directory is the root and the absolute form is better.
*
* #param base File that is the base for the result
* #param name File to be "relativized"
* #return the relative name
* #throws IOException if files have no common sub-directories, i.e. at best share the
* root prefix "/" or "C:\"
*/
public static String getRelativePath(File base, File name) throws IOException {
File parent = base.getParentFile();
if (parent == null) {
throw new IOException("No common directory");
}
String bpath = base.getCanonicalPath();
String fpath = name.getCanonicalPath();
if (fpath.startsWith(bpath)) {
return fpath.substring(bpath.length() + 1);
} else {
return (".." + File.separator + getRelativePath(parent, name));
}
}
Here is a solution other library free:
Path sourceFile = Paths.get("some/common/path/example/a/b/c/f1.txt");
Path targetFile = Paths.get("some/common/path/example/d/e/f2.txt");
Path relativePath = sourceFile.relativize(targetFile);
System.out.println(relativePath);
Outputs
..\..\..\..\d\e\f2.txt
[EDIT] actually it outputs on more ..\ because of the source is file not a directory. Correct solution for my case is:
Path sourceFile = Paths.get(new File("some/common/path/example/a/b/c/f1.txt").parent());
Path targetFile = Paths.get("some/common/path/example/d/e/f2.txt");
Path relativePath = sourceFile.relativize(targetFile);
System.out.println(relativePath);
My version is loosely based on Matt and Steve's versions:
/**
* Returns the path of one File relative to another.
*
* #param target the target directory
* #param base the base directory
* #return target's path relative to the base directory
* #throws IOException if an error occurs while resolving the files' canonical names
*/
public static File getRelativeFile(File target, File base) throws IOException
{
String[] baseComponents = base.getCanonicalPath().split(Pattern.quote(File.separator));
String[] targetComponents = target.getCanonicalPath().split(Pattern.quote(File.separator));
// skip common components
int index = 0;
for (; index < targetComponents.length && index < baseComponents.length; ++index)
{
if (!targetComponents[index].equals(baseComponents[index]))
break;
}
StringBuilder result = new StringBuilder();
if (index != baseComponents.length)
{
// backtrack to base directory
for (int i = index; i < baseComponents.length; ++i)
result.append(".." + File.separator);
}
for (; index < targetComponents.length; ++index)
result.append(targetComponents[index] + File.separator);
if (!target.getPath().endsWith("/") && !target.getPath().endsWith("\\"))
{
// remove final path separator
result.delete(result.length() - File.separator.length(), result.length());
}
return new File(result.toString());
}
Matt B's solution gets the number of directories to backtrack wrong -- it should be the length of the base path minus the number of common path elements, minus one (for the last path element, which is either a filename or a trailing "" generated by split). It happens to work with /a/b/c/ and /a/x/y/, but replace the arguments with /m/n/o/a/b/c/ and /m/n/o/a/x/y/ and you will see the problem.
Also, it needs an else break inside the first for loop, or it will mishandle paths that happen to have matching directory names, such as /a/b/c/d/ and /x/y/c/z -- the c is in the same slot in both arrays, but is not an actual match.
All these solutions lack the ability to handle paths that cannot be relativized to one another because they have incompatible roots, such as C:\foo\bar and D:\baz\quux. Probably only an issue on Windows, but worth noting.
I spent far longer on this than I intended, but that's okay. I actually needed this for work, so thank you to everyone who has chimed in, and I'm sure there will be corrections to this version too!
public static String getRelativePath(String targetPath, String basePath,
String pathSeparator) {
// We need the -1 argument to split to make sure we get a trailing
// "" token if the base ends in the path separator and is therefore
// a directory. We require directory paths to end in the path
// separator -- otherwise they are indistinguishable from files.
String[] base = basePath.split(Pattern.quote(pathSeparator), -1);
String[] target = targetPath.split(Pattern.quote(pathSeparator), 0);
// First get all the common elements. Store them as a string,
// and also count how many of them there are.
String common = "";
int commonIndex = 0;
for (int i = 0; i < target.length && i < base.length; i++) {
if (target[i].equals(base[i])) {
common += target[i] + pathSeparator;
commonIndex++;
}
else break;
}
if (commonIndex == 0)
{
// Whoops -- not even a single common path element. This most
// likely indicates differing drive letters, like C: and D:.
// These paths cannot be relativized. Return the target path.
return targetPath;
// This should never happen when all absolute paths
// begin with / as in *nix.
}
String relative = "";
if (base.length == commonIndex) {
// Comment this out if you prefer that a relative path not start with ./
//relative = "." + pathSeparator;
}
else {
int numDirsUp = base.length - commonIndex - 1;
// The number of directories we have to backtrack is the length of
// the base path MINUS the number of common path elements, minus
// one because the last element in the path isn't a directory.
for (int i = 1; i <= (numDirsUp); i++) {
relative += ".." + pathSeparator;
}
}
relative += targetPath.substring(common.length());
return relative;
}
And here are tests to cover several cases:
public void testGetRelativePathsUnixy()
{
assertEquals("stuff/xyz.dat", FileUtils.getRelativePath(
"/var/data/stuff/xyz.dat", "/var/data/", "/"));
assertEquals("../../b/c", FileUtils.getRelativePath(
"/a/b/c", "/a/x/y/", "/"));
assertEquals("../../b/c", FileUtils.getRelativePath(
"/m/n/o/a/b/c", "/m/n/o/a/x/y/", "/"));
}
public void testGetRelativePathFileToFile()
{
String target = "C:\\Windows\\Boot\\Fonts\\chs_boot.ttf";
String base = "C:\\Windows\\Speech\\Common\\sapisvr.exe";
String relPath = FileUtils.getRelativePath(target, base, "\\");
assertEquals("..\\..\\..\\Boot\\Fonts\\chs_boot.ttf", relPath);
}
public void testGetRelativePathDirectoryToFile()
{
String target = "C:\\Windows\\Boot\\Fonts\\chs_boot.ttf";
String base = "C:\\Windows\\Speech\\Common";
String relPath = FileUtils.getRelativePath(target, base, "\\");
assertEquals("..\\..\\Boot\\Fonts\\chs_boot.ttf", relPath);
}
public void testGetRelativePathDifferentDriveLetters()
{
String target = "D:\\sources\\recovery\\RecEnv.exe";
String base = "C:\\Java\\workspace\\AcceptanceTests\\Standard test data\\geo\\";
// Should just return the target path because of the incompatible roots.
String relPath = FileUtils.getRelativePath(target, base, "\\");
assertEquals(target, relPath);
}
Actually my other answer didn't work if the target path wasn't a child of the base path.
This should work.
public class RelativePathFinder {
public static String getRelativePath(String targetPath, String basePath,
String pathSeparator) {
// find common path
String[] target = targetPath.split(pathSeparator);
String[] base = basePath.split(pathSeparator);
String common = "";
int commonIndex = 0;
for (int i = 0; i < target.length && i < base.length; i++) {
if (target[i].equals(base[i])) {
common += target[i] + pathSeparator;
commonIndex++;
}
}
String relative = "";
// is the target a child directory of the base directory?
// i.e., target = /a/b/c/d, base = /a/b/
if (commonIndex == base.length) {
relative = "." + pathSeparator + targetPath.substring(common.length());
}
else {
// determine how many directories we have to backtrack
for (int i = 1; i <= commonIndex; i++) {
relative += ".." + pathSeparator;
}
relative += targetPath.substring(common.length());
}
return relative;
}
public static String getRelativePath(String targetPath, String basePath) {
return getRelativePath(targetPath, basePath, File.pathSeparator);
}
}
public class RelativePathFinderTest extends TestCase {
public void testGetRelativePath() {
assertEquals("./stuff/xyz.dat", RelativePathFinder.getRelativePath(
"/var/data/stuff/xyz.dat", "/var/data/", "/"));
assertEquals("../../b/c", RelativePathFinder.getRelativePath("/a/b/c",
"/a/x/y/", "/"));
}
}
Cool!! I need a bit of code like this but for comparing directory paths on Linux machines. I found that this wasn't working in situations where a parent directory was the target.
Here is a directory friendly version of the method:
public static String getRelativePath(String targetPath, String basePath,
String pathSeparator) {
boolean isDir = false;
{
File f = new File(targetPath);
isDir = f.isDirectory();
}
// We need the -1 argument to split to make sure we get a trailing
// "" token if the base ends in the path separator and is therefore
// a directory. We require directory paths to end in the path
// separator -- otherwise they are indistinguishable from files.
String[] base = basePath.split(Pattern.quote(pathSeparator), -1);
String[] target = targetPath.split(Pattern.quote(pathSeparator), 0);
// First get all the common elements. Store them as a string,
// and also count how many of them there are.
String common = "";
int commonIndex = 0;
for (int i = 0; i < target.length && i < base.length; i++) {
if (target[i].equals(base[i])) {
common += target[i] + pathSeparator;
commonIndex++;
}
else break;
}
if (commonIndex == 0)
{
// Whoops -- not even a single common path element. This most
// likely indicates differing drive letters, like C: and D:.
// These paths cannot be relativized. Return the target path.
return targetPath;
// This should never happen when all absolute paths
// begin with / as in *nix.
}
String relative = "";
if (base.length == commonIndex) {
// Comment this out if you prefer that a relative path not start with ./
relative = "." + pathSeparator;
}
else {
int numDirsUp = base.length - commonIndex - (isDir?0:1); /* only subtract 1 if it is a file. */
// The number of directories we have to backtrack is the length of
// the base path MINUS the number of common path elements, minus
// one because the last element in the path isn't a directory.
for (int i = 1; i <= (numDirsUp); i++) {
relative += ".." + pathSeparator;
}
}
//if we are comparing directories then we
if (targetPath.length() > common.length()) {
//it's OK, it isn't a directory
relative += targetPath.substring(common.length());
}
return relative;
}
I'm assuming you have fromPath (an absolute path for a folder), and toPath (an absolute path for a folder/file), and your're looking for a path that with represent the file/folder in toPath as a relative path from fromPath (your current working directory is fromPath) then something like this should work:
public static String getRelativePath(String fromPath, String toPath) {
// This weirdness is because a separator of '/' messes with String.split()
String regexCharacter = File.separator;
if (File.separatorChar == '\\') {
regexCharacter = "\\\\";
}
String[] fromSplit = fromPath.split(regexCharacter);
String[] toSplit = toPath.split(regexCharacter);
// Find the common path
int common = 0;
while (fromSplit[common].equals(toSplit[common])) {
common++;
}
StringBuffer result = new StringBuffer(".");
// Work your way up the FROM path to common ground
for (int i = common; i < fromSplit.length; i++) {
result.append(File.separatorChar).append("..");
}
// Work your way down the TO path
for (int i = common; i < toSplit.length; i++) {
result.append(File.separatorChar).append(toSplit[i]);
}
return result.toString();
}
Lots of answers already here, but I found they didn't handle all cases, such as the base and target being the same. This function takes a base directory and a target path and returns the relative path. If no relative path exists, the target path is returned. File.separator is unnecessary.
public static String getRelativePath (String baseDir, String targetPath) {
String[] base = baseDir.replace('\\', '/').split("\\/");
targetPath = targetPath.replace('\\', '/');
String[] target = targetPath.split("\\/");
// Count common elements and their length.
int commonCount = 0, commonLength = 0, maxCount = Math.min(target.length, base.length);
while (commonCount < maxCount) {
String targetElement = target[commonCount];
if (!targetElement.equals(base[commonCount])) break;
commonCount++;
commonLength += targetElement.length() + 1; // Directory name length plus slash.
}
if (commonCount == 0) return targetPath; // No common path element.
int targetLength = targetPath.length();
int dirsUp = base.length - commonCount;
StringBuffer relative = new StringBuffer(dirsUp * 3 + targetLength - commonLength + 1);
for (int i = 0; i < dirsUp; i++)
relative.append("../");
if (commonLength < targetLength) relative.append(targetPath.substring(commonLength));
return relative.toString();
}
Here a method that resolves a relative path from a base path regardless they are in the same or in a different root:
public static String GetRelativePath(String path, String base){
final String SEP = "/";
// if base is not a directory -> return empty
if (!base.endsWith(SEP)){
return "";
}
// check if path is a file -> remove last "/" at the end of the method
boolean isfile = !path.endsWith(SEP);
// get URIs and split them by using the separator
String a = "";
String b = "";
try {
a = new File(base).getCanonicalFile().toURI().getPath();
b = new File(path).getCanonicalFile().toURI().getPath();
} catch (IOException e) {
e.printStackTrace();
}
String[] basePaths = a.split(SEP);
String[] otherPaths = b.split(SEP);
// check common part
int n = 0;
for(; n < basePaths.length && n < otherPaths.length; n ++)
{
if( basePaths[n].equals(otherPaths[n]) == false )
break;
}
// compose the new path
StringBuffer tmp = new StringBuffer("");
for(int m = n; m < basePaths.length; m ++)
tmp.append(".."+SEP);
for(int m = n; m < otherPaths.length; m ++)
{
tmp.append(otherPaths[m]);
tmp.append(SEP);
}
// get path string
String result = tmp.toString();
// remove last "/" if path is a file
if (isfile && result.endsWith(SEP)){
result = result.substring(0,result.length()-1);
}
return result;
}
Passes Dónal's tests, the only change - if no common root it returns target path (it could be already relative)
import static java.util.Arrays.asList;
import static java.util.Collections.nCopies;
import static org.apache.commons.io.FilenameUtils.normalizeNoEndSeparator;
import static org.apache.commons.io.FilenameUtils.separatorsToUnix;
import static org.apache.commons.lang3.StringUtils.getCommonPrefix;
import static org.apache.commons.lang3.StringUtils.isBlank;
import static org.apache.commons.lang3.StringUtils.isNotEmpty;
import static org.apache.commons.lang3.StringUtils.join;
import java.io.File;
import java.util.ArrayList;
import java.util.List;
public class ResourceUtils {
public static String getRelativePath(String targetPath, String basePath, String pathSeparator) {
File baseFile = new File(basePath);
if (baseFile.isFile() || !baseFile.exists() && !basePath.endsWith("/") && !basePath.endsWith("\\"))
basePath = baseFile.getParent();
String target = separatorsToUnix(normalizeNoEndSeparator(targetPath));
String base = separatorsToUnix(normalizeNoEndSeparator(basePath));
String commonPrefix = getCommonPrefix(target, base);
if (isBlank(commonPrefix))
return targetPath.replaceAll("/", pathSeparator);
target = target.replaceFirst(commonPrefix, "");
base = base.replaceFirst(commonPrefix, "");
List<String> result = new ArrayList<>();
if (isNotEmpty(base))
result.addAll(nCopies(base.split("/").length, ".."));
result.addAll(asList(target.replaceFirst("^/", "").split("/")));
return join(result, pathSeparator);
}
}
If you're writing a Maven plugin, you can use Plexus' PathTool:
import org.codehaus.plexus.util.PathTool;
String relativeFilePath = PathTool.getRelativeFilePath(file1, file2);
If Paths is not available for JRE 1.5 runtime or maven plugin
package org.afc.util;
import java.io.File;
import java.util.LinkedList;
import java.util.List;
public class FileUtil {
public static String getRelativePath(String basePath, String filePath) {
return getRelativePath(new File(basePath), new File(filePath));
}
public static String getRelativePath(File base, File file) {
List<String> bases = new LinkedList<String>();
bases.add(0, base.getName());
for (File parent = base.getParentFile(); parent != null; parent = parent.getParentFile()) {
bases.add(0, parent.getName());
}
List<String> files = new LinkedList<String>();
files.add(0, file.getName());
for (File parent = file.getParentFile(); parent != null; parent = parent.getParentFile()) {
files.add(0, parent.getName());
}
int overlapIndex = 0;
while (overlapIndex < bases.size() && overlapIndex < files.size() && bases.get(overlapIndex).equals(files.get(overlapIndex))) {
overlapIndex++;
}
StringBuilder relativePath = new StringBuilder();
for (int i = overlapIndex; i < bases.size(); i++) {
relativePath.append("..").append(File.separatorChar);
}
for (int i = overlapIndex; i < files.size(); i++) {
relativePath.append(files.get(i)).append(File.separatorChar);
}
relativePath.deleteCharAt(relativePath.length() - 1);
return relativePath.toString();
}
}
I know this is a bit late but, I created a solution that works with any java version.
public static String getRealtivePath(File root, File file)
{
String path = file.getPath();
String rootPath = root.getPath();
boolean plus1 = path.contains(File.separator);
return path.substring(path.indexOf(rootPath) + rootPath.length() + (plus1 ? 1 : 0));
}
org.apache.ant has a FileUtils class with a getRelativePath method. Haven't tried it myself yet, but could be worthwhile to check it out.
http://javadoc.haefelinger.it/org.apache.ant/1.7.1/org/apache/tools/ant/util/FileUtils.html#getRelativePath(java.io.File, java.io.File)
private String relative(String left, String right){
String[] lefts = left.split("/");
String[] rights = right.split("/");
int min = Math.min(lefts.length, rights.length);
int commonIdx = -1;
for(int i = 0; i < min; i++){
if(commonIdx < 0 && !lefts[i].equals(rights[i])){
commonIdx = i - 1;
break;
}
}
if(commonIdx < 0){
return null;
}
StringBuilder sb = new StringBuilder(Math.max(left.length(), right.length()));
sb.append(left).append("/");
for(int i = commonIdx + 1; i < lefts.length;i++){
sb.append("../");
}
for(int i = commonIdx + 1; i < rights.length;i++){
sb.append(rights[i]).append("/");
}
return sb.deleteCharAt(sb.length() -1).toString();
}
Psuedo-code:
Split the strings by the path seperator ("/")
Find the greatest common path by iterating thru the result of the split string (so you'd end up with "/var/data" or "/a" in your two examples)
return "." + whicheverPathIsLonger.substring(commonPath.length);