Well this question may sound stupid, but I did research like hours to find solution but I couldn't so if anyone knows, that would be GREAT!!!
I successfully read arc file (from commoncrawl dataset). With arcHeader.getUrl(); I'm getting all URLs. However I don't understand, if 'outgoing' links from that particular URL is there, if its there how to get those?
[PS] By 'outgoing', I mean, in whole page, which URL it contains as say ad, content etc. Does that commoncrawl arc file contains, if yes how to get those?
Thanks in advance!
EDIT: I solved this, read HTML content and got all ! wasnt that difficult!
Related
Halo, friends?
I am very new to programming, I am trying to Learn Kotlin.
I am trying to get text from a web page to a text view in my activity.
e.g, in 'https://makeandsellsoap.blogspot.com/p/ratings.html' there is a text on that page an I would like to get it to a text view.(read page content)
how can i start to go about this? Or is there anyone who can point me in the right direction. I've been trying examples from the internet for three days now to no avail.
I hope my question is wel presented, if not please, forgive and correct me. I'll learn along the way.
Thanks in advance.
i need to print on paper some png files below their filename as text, in java.
The doc style should follow this example:
How can i achive this?
Thanks in advance
EDIT:
I have found this nice "IText" guide, and seems exactly what i was looking for.
It's just for creating pdf, so it won't sent nothing to the printer, but can resolve my problem in a nice way. I'll give it a try tomorrow
IText Guide
first of all: My goal is to just load a PDF, highlight words from that PDF (Page) and show that Page / PDF to the user as Image.
Till now i parse the PDF with a custom Text-Stripper to get all word-positions with their coordinates ( needed to generate a rectangle for highlighting later)
After that i started to generate PDAnnotationTextMarkup's so. Now i'm at this point where i can see my annotations well if i save the pdf to a file and view it with a PDFReader by choice. But if i use the convertToImage Method given by PDFBox, i only get a normal page rendered without annotations.
After a little time on google i found: PDFBOX-2019 which was mentioned in another stackoverflow question
Now im looking for a workaround because i think the ticket history is showing that no one will fix that issue in about a year.
Anybody a good idea to fix that and achieve my goal?
thanks in advance
ben
So here's the thing, I want to build a GridView containing few items that are movable/rearrangeable by the user. Pretty much similiar as the one you'd see on the home screen of Android.
I've looked into some places but ended up with zero result. The closest thing I was able to find was the sample of a home screen app found on developers.android.com but It doesn't have the thing I'm looking for (the items are unarrangeable).
Please give me a hint how this could be done. Code examples are also welcome.
Thanks in advance! :)
I think you should implement a drag&drop.
The GridView can handle the OnDragListener, so you could do : myGridView.setOnDragListener and do whatever you want with it.
I also advise you to check the following link
I hope it helped!
I develope new program but i need to allow user to highlighting word in pdf file then i want to process the file to get list of highlighted words with place
how can do that by java
thank in advance
PDF files are PostScript, which is very difficult to process. I doubt there's an easy way.
Take a look at http://java-source.net/open-source/pdf-libraries , but be aware you might have some difficulty.
Also, read http://partners.adobe.com/public/developer/en/pdf/HighlightFileFormat.pdf for the specs of the highlight format. Depending on what "place" information you need, that might be enough.
How are you displaying the PDF? If you are displaying the image, you just need the word co-ordinates. Something like PdfBox or JPedal or maybe IText can do this.