StringUtils.containsIgnoreCase() is returning wrong result - java

i have two values:
String a = "00tz"; // (Eclipse internal debug value: [, 0, 0, t, z]) and
String b = "tz"; // (Eclipse internal debug value: [, t, z])
I am reading this values from an ArrayList like
for (String a : stringLists) {
...
}
I get "false" when i compare this two values with StringUtils.containsIgnoreCase(a,b). But it should return true because "tz" is existing in "00tz".
Im using apache.commons.lang3.StringUtils. To change the values a & b didn't worked. The length of "a" is 5 and "b" is 3. It also returns false when i use a.contains(b).
These are the results when i output the value with
System.out.println(Arrays.toString(a.getBytes(StandardCharsets.UTF_8)));
a:[-17, -69, -65, 48, 48, 118, 119]
b:[-17, -69, -65, 118, 119]
Im reading this values from a .txt file which contains several values like this. I read it in this way:
File fileA = new File("test/a.txt");
File fileB = new File("test/b.txt");
lista = (ArrayList<String>) FileUtils.readLines(fileA, "utf-8");
listb = (ArrayList<String>) FileUtils.readLines(fileB, "utf-8");
Do you have an idea what the problem is?
Thank you!

Related

How to return a list contains Object in sqlalchemy in python, is similar to List<OrderInfo> in java

i want to get a list which contains entire OrderInfo Object, for example, if i do this, it is result which the result i want to.
def find_all(self):
result_list = []
orderDao = DaoUtil.DaoGeneric()
session = orderDao.getSession()
try:
for row in session.query(OrderInfo).all():
result_list.append({
'id':row.id,
'name': row.name,
'age': row.age,
'create_time': row.create_time.strftime("%Y-%m-%d %H:%M:%S"),
'update_time': row.update_time.strftime("%Y-%m-%d %H:%M:%S"),
'version': row.version
})
session.commit()
except Exception, e:
print e
session.rollback()
return result_list
but i want to a list which contains OrderInfo object from the query, because the result which the query return have other columns (the simple list of all DeclarativeBase's instances.) except OrderInfo{id,name,age,create_time,update_time,version}, the query do not return OrderInfo object directly. the following which i want to:
def find_all(self):
result_list = []
orderDao = DaoUtil.DaoGeneric()
session = orderDao.getSession()
try:
for row in session.query(OrderInfo).all():
result_list.append(row.orderInfo) // if the row has a property for orderInfo Object, because the result which java can achieve , the example for java is : List<OrderInfo> orderList = session.query(); please help to achieve it
session.commit()
except Exception, e:
print e
session.rollback()
return result_list
beacause i use sqlalchemy in python just now, i am not very sure. How to get a list which contains OrderInfo Object from query in sqlalchemy
#univerio
for row in session.query(OrderInfo).all():
the line of variable 's row which includes the following column:
_decl_class_registry,
_sa_class_manager,
_sa_instance_state,
metadata,
query,
id,
name,
age,
create_time,
update_time ,
version
only this (OrderInfo{id, name, age, create_time,update_time,version}) is what i want to get, other columns which i do not want to get.
OrderInfo:
from sqlalchemy import Column, Integer, String, Date, DateTime
from sqlalchemy.ext.declarative import declarative_base
Base=declarative_base()
class OrderInfo(Base):
__tablename__ = 'order_info'
# __table__ = 'order_info'
id = Column(Integer, primary_key=True)
name = Column(String(100))
age = Column(Integer)
create_time = Column(Date)
update_time = Column(Date)
version = Column(Integer)
#univerio, i write here because the comments has limit of words, it achieve this result, like this:
for row in session.query(OrderInfo).all():
result_list.append({
'id':row.id,
'name': row.name,
'age': row.age,
'create_time': row.create_time.strftime("%Y-%m-%d %H:%M:%S"),
'update_time': row.update_time.strftime("%Y-%m-%d %H:%M:%S"),
'version': row.version
})
the OrderInfo has very few columns, if the OrderInfo object has more than serveral hundred columns, but it will take a long time, so i want find the result which can simple achieve this function that is similar to List in Java
#univerio,
i find the answer which i want to get:
def find_all(self):
result_list = []
orderDao = DaoUtil.DaoGeneric()
session = orderDao.getSession()
try:
for row in session.query(OrderInfo).all():
result_list.append(DictUtil.object_as_dict(row))
session.commit()
except Exception, e:
print e
session.rollback()
return result_list
def object_as_dict(obj):
result = {instance.key: getattr(obj, instance.key) for instance in inspect(obj).mapper.column_attrs}
print result
return result
output :
[
{'updateTime': datetime.datetime(2017, 6, 15, 13, 56, 16), 'bankName': u'ICBC', 'bankNo': u'6228480666622220011', 'createTime': datetime.datetime(2017, 6, 15, 13, 56, 16), u'version': 0, u'id': 1},
{'updateTime': datetime.datetime(2017, 6, 15, 13, 57, 40), 'bankName': u'ICBC', 'bankNo': u'6228480666622220011', 'createTime': datetime.datetime(2017, 6, 15, 13, 57, 40), u'version': 0, u'id': 2},
{'updateTime': datetime.datetime(2017, 6, 15, 13, 58), 'bankName': u'ICBC', 'bankNo': u'6228480666622220011', 'createTime': datetime.datetime(2017, 6, 15, 13, 58), u'version': 0, u'id': 3}
]

Threads constantly interrupting each other, log doesn't reflect system operations

I built a system that simulates memory paging, just like an MMU.
And to better regulate and understand how it works, I am logging it.
My problem is that it seems the log is not accurately reflecting the operations of the system, or rather it does, but then I have a big problem with threads that I need some help solving.
I'll try and explain.
public void run() //gets pages and writes to them
{ // i printed the pageId of every process to check they are running at the same time and competing for resources
for(ProcessCycle currentCycle : processCycles.getProcessCycles())
{
Long[] longArray = new Long[currentCycle.getPages().size()];
try {
for(int i = 0; i < currentCycle.getPages().size();i++)
{
MMULogger.getInstance().write("GP:P" + id + " " + currentCycle.getPages().get(i) + " " + Arrays.toString(currentCycle.getData().get(i)), Level.INFO);
}
Page<byte[]>[] newPages = mmu.getPages(currentCycle.getPages().toArray(longArray));
List<byte[]> currentPageData = currentCycle.getData();
System.out.println("process id " + id);
for(int i = 0; i < newPages.length;i++)
{
byte[] currentData = currentPageData.get(i);
newPages[i].setContent(currentData);
}
Thread.sleep(currentCycle.getSleepMs());
} catch (ClassNotFoundException | IOException | InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
this code snippet is taken from a class called Process. Just like in a computer, I have multiple processes, and they each need to read and write from pages, which they request from the class called MMU. this is the "mmu.getpages" part.
We also write to our log file inside the method get pages:
public synchronized Page<byte[]>[] getPages(java.lang.Long[] pageIds) throws java.io.IOException, ClassNotFoundException
{
#SuppressWarnings("unchecked")
Page<byte[]>[] toReturn = new Page[pageIds.length];
for(int i = 0; i < pageIds.length; i++)
{
Long currentPage = algo.getElement(pageIds[i]);
if(currentPage == null) //page not found in RAM
{
if(ram.getInitialCapacity() != ram.getNumOfPages()) //ram is NOT full
{
MMULogger.getInstance().write("PF:"+pageIds[i], Level.INFO);
algo.putElement(Long.valueOf(pageIds[i]),Long.valueOf(pageIds[i]));
ram.addPage(HardDisk.getInstance().pageFault(pageIds[i]));
}
else //ram is full
{
Long IDOfMoveToHdPage = algo.putElement(pageIds[i], pageIds[i]);
Page<byte[]> moveToHdPage = ram.getPage((int)((long)IDOfMoveToHdPage));
Page<byte[]> moveToRAM = HardDisk.getInstance().pageReplacement(moveToHdPage, pageIds[i]);
ram.removePage(moveToHdPage);
ram.addPage(moveToRAM);
MMULogger.getInstance().write("PR: MTH " + moveToHdPage.getPageId() + " MTR " + moveToRAM.getPageId(), Level.INFO);
}
}
toReturn[i] = ram.getPage((int)((long)pageIds[i]));
}
return toReturn;
}
So all in all to recap - a process requests pages, I write to the log file which process requests which page and what it wants to write to it, and then I call mmu.getpages, and the logic of the system continues.
My problem is that the log looks like this:
GP:P2 5 [102, 87, -9, 85, -5]
GP:P1 1 [-9, -18, 50, -124, -102]
GP:P4 10 [79, -51, 67, 118, 111]
GP:P2 6 [-20, -22, 3, -74, -65]
GP:P3 7 [90, 56, 91, 71, -115]
PF:5
GP:P6 18 [28, -39, -3, 64, -117]
GP:P5 13 [72, -26, 52, -84, 6]
GP:P4 11 [-55, -70, -88, -9, 38]
GP:P1 2 [39, 112, -117, 5, 109]
GP:P5 12 [38, -31, 18, -40, 36]
which is not what I wanted. At first you can see process 2 requested page 5 and wanted to write to it [102, 87, -9, 85, -5].
After that line, I would have expected to see "PF:5" but its further down. I think it is the case because process 2 ran out of time and didnt manage to finish mmu.getpages operation. so it never printed PF:5 to the file.
That is a problem for me. I want the processes to run simultaneously in a multithreaded fashion, but i want the log to be of the form:
GP:P2 5 [1,1,1,1,1]
PF:5
GP:P2 7 [1,2,3,4,5]
PF:7
GP:P19 12 [0,0,0,0,0]
PF:12
For example

how to import a mat value in openCV

I exported many CalcHist mat-values into a file and build an average of the values. Now I want to import it back into my Java Programm (static in the Code or import as file, it doesn't matter) to compare the calculated average hist with a hist of an given Image. The problem is that I don't know how to import the values into a Mat variable.
Example hist-values:
[680.365; 898.065; 583.155; 971.535; 454.825; 202.34; 213.26; 316.98; 449.765; 9902.57; 357078.595; 1292.21; 521.705; 157.985; 109.985; 137.685; 301.395; 127.58; 0; 0; 0; 0; 0; 0]
If I implement it as a static value with MatOfDouble (I don't have an idea to do it in another way) like this:
MatOfDouble averageHist= new MatOfDouble(680.365, 898.065, 583.155, 971.535, 454.825, 202.34, 213.26, 316.98, 449.765, 9902.57, 357078.595, 1292.21, 521.705, 157.985, 109.985, 137.685, 301.395, 127.58, 0, 0, 0, 0, 0, 0);
I can use the variable for a compareHist like this:
double res= Imgproc.compareHist(baseHist, averageHist, Imgproc.CV_COMP_CORREL);
and compile it, but at runtime it will throw a exception.
the programm works, if I only use Mat values that I directly get from calchists like the variable baseHist, but for averageHist I want to put it static in the Code or read it from a file.
I tried to cast the MatOfDouble into a normal Mat like this:
Mat averSunnyHistCast = new Mat();
averSunnyHistCast = (Mat) averageHist;
but it doesn't help.
this is the error:
CvException [org.opencv.core.CvException: cv::Exception: C:\builds\master_PackSlaveAddon-win64-vc12-static\opencv\modules\imgproc\src\histogram.cpp:2281: error: (-215) H1.type() == H2.type() && H1.depth() == CV_32F in function cv::compareHist
thanks for your help!
This post helped: OpenCV How to initialize Mat with 2D array in JAVA
I wrote my values static in an array, after that I added each value into the Mat variable with the put method.
double[] dblArr = {650, 1230, 257, 66, 38, 19, 20, 33, 75, 15617, 350640, 2737, 1251, 547, 325, 328, 417, 150, 0, 0, 0, 0, 0, 0, 0};
Mat averageHist = new Mat(25, 1, CvType.CV_32F);
for (int i=0; i<25; i++){
averSunny200.put(i, 0, dblArr[i]);
}

Lotus Notes 8.5 : Create a meeting with java

I started to work with Lotus Notes 8.5, and i have to create meetings in Lotus by using Java, lotus.domino and Notes.jar.
So, to do this, i create a new Document and fill it with all the needed fields (i think). This is how i create my meeting :
Document newDocument = db.createDocument();
newDocument.appendItemValue("Body", "Create meeting from java.");
newDocument.appendItemValue("Notes", "Test Notes");
newDocument.appendItemValue("Chair", "Me");
newDocument.appendItemValue("Principal", "Me");
newDocument.appendItemValue("$altPrincipal", "Me");
newDocument.appendItemValue("ExcludeFromView", "D,S");
newDocument.appendItemValue("UpdateSeq", 1);
newDocument.appendItemValue("$CSVersion", "2");
newDocument.appendItemValue("$SMTPKeepNotesItems", "1");
newDocument.appendItemValue("$CSWISL", "$S:1,$L:1,$B:1,$R:1,$E:1,$W:1,$O:1,$M:1,RequiredAttendees:1,INetRequiredNames:1,AltRequiredNames:1,StorageRequiredNames:1,OptionalAttendees:1,INetOptionalNames:1,AltOptionalNames:1,StorageOptionalNames:1,ApptUNIDURL:1,STUnyteConferenceURL:1,STUnyteConferenceID:1,SametimeType:1,WhiteBoardContent:1,STRoomName:1");
newDocument.appendItemValue("WebDateTimeInit", "1");
newDocument.appendItemValue("OrgTable", "C0");
newDocument.appendItemValue("$AlarmDisabled", "1");
newDocument.appendItemValue("$HFFlags", "1");
newDocument.appendItemValue("Form", "Appointment");
newDocument.appendItemValue("$FromPreferredLanguage", "fr");
newDocument.appendItemValue("ApptUNID", "267DEFCD6ADE4EF8C1257DF600464A1B642");
newDocument.appendItemValue("$LangChair", "");
newDocument.appendItemValue("AppointmentType", "3");
newDocument.appendItemValue("$TableSwitcher", "Description");
newDocument.appendItemValue("OnlineMeeting", "");
newDocument.appendItemValue("From", "Me");
newDocument.appendItemValue("AltChair", "Me");
newDocument.appendItemValue("OnlinePlace", "");
newDocument.appendItemValue("IsBroadcast", "");
newDocument.appendItemValue("$ExpandGroups", 3);
newDocument.appendItemValue("IsTeamCalendar", "");
newDocument.appendItemValue("Importance", "");
newDocument.appendItemValue("OrgConfidential", "");
newDocument.appendItemValue("Subject", "Meeting test from Java");
newDocument.appendItemValue("PreventCounter", "");
newDocument.appendItemValue("Location", "R1");
newDocument.appendItemValue("RoomToReserve", "");
newDocument.appendItemValue("Resources", "");
newDocument.appendItemValue("$PaperColor", 1);
newDocument.appendItemValue("STRecordMeeting", "");
newDocument.appendItemValue("WhiteBoardContent", "");
newDocument.appendItemValue("Categories", "");
newDocument.appendItemValue("$BorderColor", "7F96A3");
newDocument.appendItemValue("$WatchedItems", "$S,$L,$B,$R,$E,$W,$O,$M,RequiredAttendees,INetRequiredNames,AltRequiredNames,StorageRequiredNames,OptionalAttendees,INetOptionalNames,AltOptionalNames,StorageOptionalNames,ApptUNIDURL,STUnyteConferenceURL,STUnyteConferenceID,SametimeType,WhiteBoardContent,STRoomName");
newDocument.appendItemValue("CalForwardChairNotificationTo", "");
newDocument.appendItemValue("ReturnReceipt", "");
newDocument.appendItemValue("PreventDelegate", "");
newDocument.appendItemValue("EnterSendTo", "");
newDocument.appendItemValue("EnterCopyTo", "");
newDocument.appendItemValue("EnterBlindCopyTo", "");
newDocument.appendItemValue("ConferenceCallInfo", "");
newDocument.appendItemValue("SchedulerSwitcher", "1");
newDocument.appendItemValue("$Abstract", "");
newDocument.appendItemValue("StartTimeZone", "Z=-1$DO=1$DL=3 -1 1 10 -1 1$ZN=Western/Central Europe");
newDocument.appendItemValue("EndTimeZone", "Z=-1$DO=1$DL=3 -1 1 10 -1 1$ZN=Western/Central Europe");
newDocument.appendItemValue("NewStartTimeZone", "");
newDocument.appendItemValue("NewEndTimeZone", "");
newDocument.appendItemValue("Encrypt", "Représentation invalide. (undefined)");
newDocument.appendItemValue("Sign", "");
newDocument.appendItemValue("MeetingType", "");
newDocument.appendItemValue("$PublicAccess", "1");
newDocument.appendItemValue("StartDate", "27/02/2015");
newDocument.appendItemValue("StartTime", "11:00:00");
newDocument.appendItemValue("STARTDATETIME", s.createDateTime(new GregorianCalendar(2015, 02, 27, 11, 0, 0)));
newDocument.appendItemValue("EndDate", "27/02/2015");
newDocument.appendItemValue("EndTime", "13:00:00");
newDocument.appendItemValue("EndDateTime", s.createDateTime(new GregorianCalendar(2015, 02, 27, 13, 0, 0)));
newDocument.appendItemValue("CalendarDateTime", s.createDateTime(new GregorianCalendar(2015, 02, 27, 11, 0, 0)));
newDocument.appendItemValue("_ViewIcon", 158);
newDocument.appendItemValue("$ShowComments", "Normal");
newDocument.appendItemValue("$ShowDescription", "Show");
newDocument.appendItemValue("$BusyName", "Me");
newDocument.appendItemValue("$BusyPriority", "1");
newDocument.appendItemValue("SequenceNum", 2);
newDocument.appendItemValue("$CSTrack", "Imported from my contacts at 02/24/2015 14:00:17");
newDocument.appendItemValue("$NoPurge", s.createDateTime(new GregorianCalendar(2015, 02, 27, 13, 0, 0)));
newDocument.appendItemValue("$UpdatedBy", "Me");
newDocument.appendItemValue("$Revisions", s.createDateTime(new GregorianCalendar(2015, 02, 24, 13, 48, 31)));
newDocument.appendItemValue("tmpUseLongDate", s.createDateTime(new GregorianCalendar(2015, 02, 27, 11, 0, 0)));
newDocument.appendItemValue("tmpEventLabel", "Meeting from Java (tmpEventLabel)");
newDocument.appendItemValue("dispRepeatText", "Meeting from Java (dispRepeatText)");
newDocument.appendItemValue("tmpHideTimeZone", "");
newDocument.appendItemValue("tmpStartDate1", "27/02/2015");
newDocument.save();
When i came back to Lotus, the meeting is created and shown in the calendar.
The problem is, whatever field i add or remove, when i click on this meeting, Lotus throw me an error, saying (approximate translation from french):
Field : 'tmpStartDate1' : Temporary data required for operator or # function
So i add this new field in java, launch the program, create a meeting, and...
Field : 'tmpHideTimeZone' : Temporary data required for operator or # function
For now, it is the 5th field Lotus asked me with this message....
All 'tmp******' fields are missing, but i don't know them. I thought those fields could be generated by Lotus, because of the 'tmp', but how ?
I compare with other meetings created with Lotus Notes, and there are no such 'tmp' fields.
Any ideas ?
EDIT :
I tried this :
newDocument.computeWithForm(true, true);
It throws a NotesException if your document isn't in a good format.
When i execute my code (approximate translation again...) :
NotesException: Incorrect data type in the field.
I tried computeWithForm in another program where I create and add a contact to Lotus, and i didn't get any errors.
So indeed, there is a problem with meeting's fields...
EDIT :
Any new ideas ? I am a bit confused about how to develop on this software....
Items have types too ! Also a date is not a string, a date is a complicated animal, looks innocuous at first sight, but can bite very hard.
The Document.appendItemValue method coerces the item to be of Text type.
For date/time (not temporary ;-) items you should use Document.ReplaceItemValueCustomData and pass it an argument of class DateTime.

Reading PDF Literal String parsing dilemma

I have the following contents in the same PDF page, in different ObjectX:
First:
[(some text)] TJ ET Q
[(some other text)] TJ ET Q
Very simple and basic so far...
The second:
[( H T M L E x a m p l e)] TJ ET Q
[( S o m e s p e c i a l c h a r a c t e r s : < ¬ ¬ ¬ & ט ט © > \\ s l a s h \\ \\ d o u b l e - s l a s h \\ \\ \\ t r i p l e - s l a s h )] TJ ET Q
NOTE: It is not noticeable in text above, but:
'H T M L E x a m p l e' is actually 0H0T0M0L0[32]0E0x0a0m0p0l0e where each 0 is a literal value 0 == ((char)0) so if I ignore all the 0 values, this actually turns to be like the upper example...
Some Bytes:
htmlexample == [0, 72, 0, 84, 0, 77, 0, 76, 0, 32, 0, 69, 0, 120, 0, 97, 0, 109, 0, 112, 0, 108, 0, 101]
<content> == [0, 32, 32, -84, 0, 32, 32, -84, 0, 32, 32, -84, 0, 32, 0, 38, 0, 32, 0, -24, 0, 32, 0, -24, 0, 32, 0, -87, 0, 32, 0]
But in the next line I need to combine every two bytes into a char because of the following:
< ¬ ¬ ¬...> is actually <0[32][32]¬0[32][32]¬0[32][32]¬...> where the combination of [32]¬ is €
The problem I'm facing is not the conversion itself I use:
new String(sb.toString().getBytes("UTF-8"),"UTF-16BE")
The problem is to know when to apply it and when to keep the UTF-8.
== UPDATE ==
The font used for the problematic Object is:
#7 0# {
'Name' : "F4"
'BaseFont' : "AAAAAE+DejaVuSans-Bold"
'Subtype' : "Type0"
'ToUnicode' : #41 0# {
'Filter' : "FlateDecode"
'Length' : 1679.0f
} + Stream(5771 bytes)
'Encoding' : "Identity-H"
'DescendantFonts' : [#42 0# {
'FontDescriptor' : #43 0# {
'MaxWidth' : 2016.0f
'AvgWidth' : 573.0f
'FontBBox' : [-1069.0f, -415.0f, 1975.0f, 1174.0f]
'MissingWidth' : 600.0f
'FontName' : "AAAAAE+DejaVuSans-Bold"
'Type' : "FontDescriptor"
'CapHeight' : 729.0f
'StemV' : 60.0f
'Leading' : 0.0f
'FontFile2' : #34 0# {
'Filter' : "FlateDecode"
'Length1' : 83036.0f
'Length' : 34117.0f
} + Stream(83036 bytes)
'Ascent' : 928.0f
'Descent' : -236.0f
'XHeight' : 547.0f
'StemH' : 26.0f
'Flags' : 32.0f
'ItalicAngle' : 0.0f
}
'Subtype' : "CIDFontType2"
'W' : [32.0f, [348.0f, 456.0f, 521.0f, 838.0f, 696.0f, 1002.0f, 872.0f, 306.0f, 457.0f, 457.0f, 523.0f, 838.0f, 380.0f, 415.0f, 380.0f, 365.0f], 48.0f, 57.0f, 696.0f, 58.0f, 59.0f, 400.0f, 60.0f, 62.0f, 838.0f, 63.0f, [580.0f, 1000.0f, 774.0f, 762.0f, 734.0f, 830.0f, 683.0f, 683.0f, 821.0f, 837.0f, 372.0f, 372.0f, 775.0f, 637.0f, 995.0f, 837.0f, 850.0f, 733.0f, 850.0f, 770.0f, 720.0f, 682.0f, 812.0f, 774.0f, 1103.0f, 771.0f, 724.0f, 725.0f, 457.0f, 365.0f, 457.0f, 838.0f, 500.0f, 500.0f, 675.0f, 716.0f, 593.0f, 716.0f, 678.0f, 435.0f, 716.0f, 712.0f, 343.0f, 343.0f, 665.0f, 343.0f, 1042.0f, 712.0f, 687.0f, 716.0f, 716.0f, 493.0f, 595.0f, 478.0f, 712.0f, 652.0f, 924.0f, 645.0f, 652.0f, 582.0f, 712.0f, 365.0f, 712.0f, 838.0f], 160.0f, [348.0f, 456.0f, 696.0f, 696.0f, 636.0f, 696.0f, 365.0f, 500.0f, 500.0f, 1000.0f, 564.0f, 646.0f, 838.0f, 415.0f, 1000.0f, 500.0f, 500.0f, 838.0f, 438.0f, 438.0f, 500.0f, 736.0f, 636.0f, 380.0f, 500.0f, 438.0f, 564.0f, 646.0f], 188.0f, 190.0f, 1035.0f, 191.0f, 191.0f, 580.0f, 192.0f, 197.0f, 774.0f, 198.0f, [1085.0f, 734.0f], 200.0f, 203.0f, 683.0f, 204.0f, 207.0f, 372.0f, 208.0f, [838.0f, 837.0f], 210.0f, 214.0f, 850.0f, 215.0f, [838.0f, 850.0f], 217.0f, 220.0f, 812.0f, 221.0f, [724.0f, 738.0f, 719.0f], 224.0f, 229.0f, 675.0f, 230.0f, [1048.0f, 593.0f], 232.0f, 235.0f, 678.0f, 236.0f, 239.0f, 343.0f, 240.0f, [687.0f, 712.0f, 687.0f, 687.0f, 687.0f, 687.0f, 687.0f], 247.0f, [838.0f, 687.0f], 249.0f, 252.0f, 712.0f, 253.0f, [652.0f, 716.0f]]
'Type' : "Font"
'BaseFont' : "AAAAAE+DejaVuSans-Bold"
'CIDSystemInfo' : {
'Supplement' : 0.0f
'Ordering' : "Identity" + Stream(8 bytes)
'Registry' : "Adobe" + Stream(5 bytes)
}
'DW' : 600.0f
'CIDToGIDMap' : #44 0# {
'Filter' : "FlateDecode"
'Length' : 10200.0f
} + Stream(131072 bytes)
}]
'Type' : "Font"
}
There is no indication to the encoding type of the font.
== Update ==
As for the ToUnicode object, in the case of these font it is an unnecessary it should have been Identity-H but instead it is an X == X mapping here are some examples that goes from until FFFF:
<0000> <00ff> <0000>
<0100> <01ff> <0100>
<0200> <02ff> <0200>
<0300> <03ff> <0300>
<0400> <04ff> <0400>
<0500> <05ff> <0500>
<0600> <06ff> <0600>
<0700> <07ff> <0700>
<0800> <08ff> <0800>
<0900> <09ff> <0900>
<0a00> <0aff> <0a00>
<0b00> <0bff> <0b00>
<0c00> <0cff> <0c00>
<0d00> <0dff> <0d00>
<0e00> <0eff> <0e00>
<0f00> <0fff> <0f00>
<1000> <10ff> <1000>
<1100> <11ff> <1100>
....
....
....
<fc00> <fcff> <fc00>
<fd00> <fdff> <fd00>
<fe00> <feff> <fe00>
<ff00> <ffff> <ff00>
So the mapping is not in the ToUnicode object, but still other renderers can render it well!
Any Ideas?
I use: new String(sb.toString().getBytes("UTF-8"),"UTF-16BE")
The problem is to know when to apply it and when to keep the UTF-8.
The OP assumes, probably after examining some sample PDF files, that strings in PDF content streams are encoded using either UTF-8 or UTF-16BE.
This assumption is wrong.
PDF allows some standard single-byte encodings (MacRomanEncoding, MacExpertEncoding, and WinAnsiEncoding) none of which is UTF-8 (due to relations between different encodings, especially ASCII, Latin1, and UTF-8, they may be confused with each other when confronted with a limited sample). Furthermore numerous predefined multi-byte encodings are also allowed, some of which are indeed UTF-16-related..
But PDF allows completely custom encodings, both single-byte and multi-byte, to be used, too!
E.g. this text drawing operation
(ABCCD) Tj
for a simple font with this encoding:
<<
/Type /Encoding
/Differences [ 65 /H /e /l /o ]
>>
displays the word Hello!
And while this may look like an artificially constructed example, the procedure to create a custom encoding like this (i.e. by assigning codes from some start value upwards to glyphs in the order in which they first occur on the page or in the document) is fairly often used.
Furthermore, the OP's current solution
If your font object has a CMap, then you treat it as a UTF-16, otherwise not.
will only work for a very few documents because
a) simple fonts (using single-byte encodings) may also supply a ToUnicode CMap and
b) composite fonts CMaps also need not be UTF-like but instead can use a mixed multi-byte encoding.
Thus, there is no way around an in-depth analysis of the used font information, cf. 9.5..9.9 of the PDF specification ISO 32000-1.
PS On some comments by the OP:
this: new String(sb.toString().getBytes("UTF-8"),"UTF-16BE") was an example to the how the problem is solved not a solution! The solution is done while fetching the glyphs whether I treat the data as 16-bit or 8-bit
and
the ToUnicode map is 16-bit(The only ones I've seen) per key,
The data may be mixed data, e.g. have a look at the Adobe CMap and CIDFont
Files Specification, here the CMap example 9 contains the section
4 begincodespacerange
<00> <80>
<8140> <9ffc>
<a0> <de>
<e040> <fbec>
endcodespacerange
which is explained to mean
Figure 6 shows how the codespace definition in this example comprises two single-byte linear ranges of codes (<00> to <80> and <A0> to <DF>) and two double-byte rectangular ranges of codes (<8140> to <9FFC> and <E040> to <FBFC>). The first two-byte region comprises all codes bounded by first-byte values of 81 through 9F and second-byte values of 40 through FC. Thus, the input code <86A9> is within the region because both bytes are within bounds. That code is valid. The input code <8210> is not within the region, even though its first byte is between 81 and 9F, because its second byte is not within bounds. That code is invalid. The second two-byte region is similarly bounded.
OK, So as this seems to be complicated, and the reason for this bug is stupid, especially on my end, but there is a lesson to be learned with regards to when to treat the chars as UTF-16, and when not to.
My problem was not while parsing the fonts, but while rendering them. according to the details specified in the Font object you can determine the type of the font and apply the correct logic to it.

Categories

Resources