Managing text in Android application like in a eBook

Managing text in Android application like in a eBook - java

I am developing an application which looks like an eBook reader, but its not exactly an eBook reader.
I have a huge text which is divided into various chapters. Now i want to present that text as a book, user should get a feel of reading an eBook which will have various features like GoTo, Search, Table of Contents and most importantly page curl transition between pages.
Now the problem is how to divide the whole content into number of pages. How can I know that, the number of characters that are going to fit into the screen( depending on the screen size and font size). I am totally confused on where to start and how to proceed.
Ultimately I am planning to develop an eBook Reader which will read only one book that is the text which I give.
Please let me know to achieve this, where to put page breaks and where to put the text( in database or resource file).

I'm trying to use Android TextSwitcher UI Component.
It automatically breaks the text according to the screen size. You can add some click listeners to traverse between pages..

Related

Best way to extract text from PDF in java

I want to make a program that is able to read PDF files and parse it's contents.
Thus I need to extract the text using some kind of library. I found 3 ways to do so.
OCR libraries (like Tesseract)
ScanPdf libraries (like iText)
Converters from PDF to text.
I fail to understand the big differences between them since all of them will produce in the end a text file from the PDF. So which is the best way to go about this?

PDF is a complex format. If you open a PDF and you're staring at a bunch of text, that doesn't really tell you much. It could be that you're staring at an image file someone decided to wrap into a PDF file. This is 99%+ certain what you have if someone scanned a document and told their scanner to 'scan to PDF', and 100% certain what you got if you have a PNG or JPG and 'save as PDF', or try to 'print to PDF' such a thing.
There is no text in the PDF then. There are pixels.
To turn pixels into text, that's where OCR libraries come in. That's what they do. That is all they do. It's an AI bonanza and error prone. No guarantees.
However, PDF is more complex than that, it isn't like PNG/JPG: It's more like HTML. You can put actual text in there.
This has different issues, though. You can place text blobs (i.e. a 'rectangle', with coordinates, and then the text that is supposed to go inside). Again a lot like HTML: You can do something like:
<p class="foo">
World!
</p>
<p class="bar">
Hello,
</p>
and then create CSS so that the foo is rendered after the bar block (can be as simple as .foo, .bar { display: block; } .foo {float: right}).
Turning that HTML into "World! Hello," is not all that tricky. Realizing that during a render, you end up seeing "Hello, World!", and thus writing code that returns "Hello, World!", that's way more complicated.
The same problem applies to PDF. For simple PDFs, extracting the raw text inside is not too difficult, but be aware that for even mildly complex PDFs, the text can arrive in a jumbled mess.
iText is trying to give you enough power, at least, to provide the latter: To give you a full hierarchical breakdown. It returns 'here is a text box, here is its positioning, and here is the text inside. and now here is another text box, etc'. It does not return a big string.
In other words: The answer depends a lot on what PDFs you have / what PDFs you expect to be able to read, and how complex they are. If they are scans, you need an OCR library. If they are simple, a basic pdf2text converter will do fine. If you want to attempt to take into account fancily positioned PDFs with forms inside and 'popups' that can be opened and closed, oof. Probably all these tools are insufficient and you're signing up to many personweeks worth of effort.

There definitely IS text embedded PDFs, it is NOT just pixels.
It depends on if the PDF is a "true" PDF (ie you can highlight the text and copy and paste it elsewhere) or if the PDF is a scanned image.
With scanned images, you'll have to use an OCR API. All of the major cloud providers have OCR APIs (ie Amazon Textract, Google Document AI, Microsoft Form Recognizer, etc). If it's a true PDF, then I've found the pdf.js library (https://mozilla.github.io/pdf.js/) quite helpful in doing a direct text extraction.
Just know that doing this only gets you the text that is literally on the page, and there's quite a bit of work still to do to get key/value data fields programmatically across many documents.
This is something that my startup is working on (www.sensible.so/) too if you're interested in something more powerful!

Use OCR Text reader to save the text

I followed the OCR text reader guide on Codelabs (https://codelabs.developers.google.com/codelabs/mobile-vision-ocr/#0).
Now, I would like to save, a single portion of the text that I am scanning.
I tried with reducing width, and height of the preview; but it doesn't work, the APK crashes (at least on the only device I have to test it).
I am completely new to Java, and Android development, but my Internship mentor said to do this; completely alone, with zero help (as no one in the company knows about development).
So, the app opens, it recognizes text. Now, I would like to to know if there is a way to take that text, and save it (XML or TXT file).
I tried to look in the code, and see if at some point, the text read is saved in a variable or something; but it looks like a live preview, done trough the Google's dependencies (or a similar process).
I am not sure, but this might be off topic, as it is similar to an open question, but I am giving details on what I have done so far, and what I have tried.
Thanks.

The detected text is displayed in the OcrGraphic.draw(Canvas) method. There, it is returned as a TextBlock. You can call textBlock.getComponents() to get the lines and textBlock.getComponents() again to get each individual word (as a Text object).
Then you can convert it to a string and write the text to a file if you would like.

How to add a non-printable image in xsl / apache-fop

I'm trying to generate an xsl to be printed in a pre-printed sheet which works fine.
Now i want to give the user a better previsualization (in the pdf screen version) adding a background image which emulates the "pre-printed" stuf on the sheet to give the user a "context" of what is he printing.
The question is: Is there any way I can set a background image in xsl (using apache fop) visible only in pdf but not in the printed version of it?
Thank you all for reading or givin any advice.

Although as the comments state, you can't have content in the PDF that does not come out in a physical printed copy, here is one possible work around for you. Depending on how your users are ultimately going to be using FOP for PDF rendering and how your a driving the work flow, it's possible to pass a parameter into an xslt file before the transofrmation phase is run, so potentially, you could do a dual rendering of the same PDF, one that is presented to the user where the background image is enabled, and one that gets printed, you could just set a variable similar to how they do in this Example, and call it something like $isPreview, and just use a simple if or choose statement to check for 'Y' or 'N'.
Since you are sending to a printer, you may even want to take advantage of FOP's ability to generate to Postscript rather than PDF, I've used this feature quite extensively for print documents using FOP while also producing a PDF copy for electronic delivery via email or hosted services, and I've yet to find any discrepancy between the PDF rendering and what is printed after sending a rendered postscript file, so it should work well for you as well.
As I said, this is not truly a solution to your problem as you've presented it, but as a work around, it could get you the desired results if your clever about how you implement it.

I don;t think the statement that it is not possible is true, I am just not sure how to create such a PDF with FOP. Certainly you can add an image field. One would use a button field and place the image in the button. Then you would set the properties of that button to not print (printable false).
PDF support images in fields: https://answers.acrobatusers.com/adding-image-field-form-q41825.aspx
RenderX supports PDF Form fields but I do not see where they support an image inside the button, only text: http://www.renderx.com/reference.html#PDF%20Forms. But they do support setting a field to "printable".

Multi-page application with corresponding pictures and text

I'm new to Java and Android. I have been trying for the past week to make an app for my phone. The app consists of 4 pages, which are diagrammed below:
Page 1: Contains a picture taking up the size of the screen. If I click on the picture it needs to go to "page 2".
Page 2: Consists of an icon on the left (say the flag for instance) followed by a text field (eg. USA). When "USA" strip is clicked it needs to go to page 3.
Page 3: Consists of text, picture and then more text from a string. This page needs to correspond to the strip clicked on in page 2 ("USA" in this example). There are also two buttons at the bottom of "page 3" and "page 4" which when pressed need to go to the corresponding page numbers.
Page 4: This page is displayed if the "More" button is pressed on "page 3".
I would like this phone to work on a minimum Android 2.2 or 2.3. All logos, pictures and string texts need to be locally available (resources folder) and not website based. I have tried all sorts of combinations of ListView's and buttons with OnClickListener's as well as toast screens. My limited knowledge of programming is frustrating.
My question is if there is a template around which will help me out with this app? Or if there are any web resources.

In Android, individual "pages" or "screens" can be implemented as Activitys. You need to extend the Activity class and add the components that you wish to display. Most of the layout can be done in an XML file. I strongly suggest that you google for a tutorial that illustrates the basics of Android programming. From there, you can start by creating an app with two pages. And then just keep adding a little bit at a time until you get the complete app that you want.

There some easy ways to do this. If you want to be backwards compatible to SDK 8 you can do this in two ways. You can use Fragments or Activities for each layout. If you implement fragments, I suggest reading http://developer.android.com/guide/components/fragments.html inside there you will about using the FragmentActivity instead of the Activity. Read this link http://developer.android.com/tools/extras/support-library.html as it will have the needed libraries to implement the Fragments in older versions of the SDK. It does looks like you looking for some navigation buttons that reside on the bottom of the layout. In order to fully implement a layout that has this view on the bottom of the screen in SDK 8, you will need to create your own View and place it on the bottom of the screen. RelativeLayout and alignParentBottom = "true" will accomplish this. There is also a way to do this using a ViewPager which will also work in the backwards compatibility requirements you have. I am sorry the amount of information I am throwing at you but I really would recommend looking into other questions posted by users on this topic. How to navigate to another page in android? for example. Good luck and everything you are looking for is able to be learned through a Google search. Maybe not all at once.

Printing data to a pre printed form/stationery

We have a requirement where we already have pre printed stationery and want user to put data in a HTML form and be able to print data on that form. Alignment/text size etc are very important since the pre-printed stationery already has boxes for each character. What could be a good way to achieve this in java? I have thinking of using jasper reports. Any other options? May be overlay image with text or something?
Also we might need to capability to print on plain paper in which case the boxes needs to be printed by our application and the form should match after the printed with the already printed blank stationery containing data.
Do we have some open source framework to do such stuff?

Jaspersoft reports -- http://sourceforge.net/projects/jasperreports/
You will then create XML templates, then you will be able to produce a report in PDF, HTML, CSV, XLS, TXT, RTF, and more. It has all the necessary options to customize the report. Used it before and recommend it.
You will create the templates with iReport then write the code for the engine to pass the data in different possible ways.
check http://www.jaspersoft.com/jasperreports
Edit:
You can have background images and overlay the boxes over it and set a limit on the max character size ... and many more
It is very powerful and gives you plenty of options
Here is one of iReport's tutorial for a background image http://ireport-tutorial.blogspot.com/2008/12/background-image-in-ireport.html

The big problem when printing form content that has been filled in electronically, is aligning it correctly on the pre-printed form. You may get content to align for one printer, but when you use another it is completely misaligned.
Fly Software have a form design product called InForm Designer that gets around the problem nicely by allowing users to specify and save vertical and horizontal offsets for printers. This ensures filled in form content is always aligned. I've tried it and it works perfectly. Take a look for yourself here...
http://www.flysoftware.com/products/inform_designer/overview.asp
It might be worth implementing a printer offset similar to InForm's in your own application (if possible).

Some things to think about.
First in terms of the web page, do you want use the stationery as the form layout?
Does it have to be exact?
Combed boxes (one for each character)
Do you want to show it like that on the web page, or deal with the combing later.
How are you going to deal with say a combed 6 digit number. Is this right aligned. What if they enter 7 digits. Same for text. what if it won't fit.
Font choices, we had a lot of fun with W...
How aligned do you want the character within the box, what font limitations does that imply, some of the auto magic software we looked at did crap like change the size of each character.
Combed editing is a nightmare, we display combed, but raise an edit surface the size of the full box on selection.
Another thing that might drive you barking mad, you find find small differences in the size and layout of the boxes, so they look okay from a distance but a column of boxes sort of shifts about by a pixel. Some of testing guys had to lend us their electron microscopes, so we could see how many ink molecules we were out by. :(
Expect to spend a lot of time in the UI side of things, and remember printed stationery changes, so giving yourself some sort of meta description of the form to start with will save you loads of trouble later on.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.