EDI Hex Character Issues - java

So we're exporting records from a Filemaker Pro database and submitting those files to our vendor via EDI, but their system cannot accept our files because of some really wonky ".." (Hex 0B) characters that show up at the beginning of every new line in the text document.
I've read that there are issues with how Filemaker Pro exports their files and because of that these Hex 0B characters can't be deleted beforehand, from within Filemaker. I have a limited understanding of Java, would there be an adequate Java based run-time solution that could be created to fix this problem?
I've tried exporting the text file in question in every available export method, including .mer, .csv, .tab, etc, with every combination of output file character set available for each export type, some create even more issues with hidden characters. I ended up settling on .tab (ASCII [DOS]) as it had the lowest amount of residual information, but I still need to get rid of these 0B characters for our EDI integration partner to be able to accept our files without any issue.
EDIT: Added more information for clarity.

The hex value "0B" in ascii is a vertical tab.
This chart is a good reference point for figuring out the hex values for specific ascii values and vice versa.
Getting the vertical tab out of the text field is a bit irritating but do-able. I had to do it for a series of documents we imported into filemaker as text and then moved to a postgresql database. What I did was the following:
Downloaded the free application Hex Fiend (this app allows you to enter text and see it's hex value or hex and see it's text equivalent)
Enter the value 0B into the left panel of Hex Fiend (the hex side)
Click into the right panel of Hex Fiend (the text side) and then click the menu command Edit -> Select All to select the entire contents of the text side and then Edit -> Copy to copy the contents (your vertical tab text character) to the clipboard
In FileMaker, create a script with script step:
Set Field[myTable::myField ; Substitute ( myTable::myField ; "*pasteVerticalTabCharacterHere*" ; "" )]
Paste your vertical tab character in between the first set of quotes of the substitute function
Note that when you do this, your substitute function will probably break to two lines
Leave the second set of quotes in the substitute function blank
When you fire this script on a record that contains vertical tabs in myfield it will substitute them for a blank string "". You could also change the substitute function to switch the vertical tab out with a carriage return like so:
Set Field[myTable::myField ; Substitute ( myTable::myField ; "*pasteVerticalTabCharacterHere*" ; ΒΆ )]
If you have a number of records that need this set field step run on you could create a script that loops through all of the records and fires this step.
You could also do it the quick and dirty way of:
Show all records that need to be cleaned
Click into the field to be cleaned
From the Records menu, select Replace Field Contents
Enter the substitute calculation above as the replace calculation
Hit replace
This will go through and clean the vertical tabs out of all of the records in one shot.
In the end, if you have a particular field or fields that you know will be likely to have vertical tabs in them, I would suggest putting an auto-enter calculation on the fields that contain a modified version of the substitute calculation above:
Substitute ( Self ; "*pasteVerticalTabCharacterHere*" ; "" )
This way, whenever you import records into the table that contain vertical tabs, the vertical tab character is cleaned out automatically.
I should also note that hexfiend is a mac app. I think on a windows computer you can just type the vertical tab character out by pressing control+K, but I haven't tried it.
Hope this helps!

Related

Disable Intellij IDEA append space after code autocomplete

An example: When I want to type public, I type pub and press tab, IDEA automatically append a space after public, how can I turn off it.
Because I am used to tapping space after autocompletion, so every time after that, there are 2 spaces and I have to delete one. Other IDEs or text editors I've used seem not to append that space automatically.
You cannot disable this particular piece of autocomplete behavior in IDEA short of disabling autocompletion feature altogether.
You can, however, use the 'Reformat' action to apply single default format to your whole file (Ctrl+Alt+L) or 'Complete Current Statement' action after you are done typing a construct (highlight the statement and press Ctrl+Shift+Enter). Conformed to the default formatting settings, double spaces should be replaced with single ones.
Assuming that this is for Java code (although the general mechanism is true for most file types), you can modify when/how spaces are used in code.
Go to File | Settings | Code Style | Java.
If you then click on the Spaces tab you can specify the code layout you want. After you've done this if you reformat your code it should format according to your preferences.
You can specify this for other types too (General, CSS, JavaScript and so on).

How to determine coordinates of multi-line text in PDF

I am using Apache PdfBox 2.0 in order to parse a pdf file. Having some fixed strings, I was able to create a system based on:
A fixed text, as a starting point
The next cell/text position, or null
The bottom area, to determine the height of the rectangle.
Using the starting point, I am computing the x and y (see below pic for pdf structure in PDF Box:
Using the "next" text block (which is another fixed value, for example a field or a table header), I am determining the width of the desired region, using formula:
width = second.x - first.x
or something similar. So, in a table, for example, knowing in advance the header names, it's easy to detect the columns. What I am trying to do (and so far fail to do so in an accurate way) is to determine the lines in a pdf table. This table sometimes contains missing values in some columns and also multiple lines values for some rows/columns. I have extended my "system" (first, next, bottom) to work dinamycally with table rows, and this works great when I have "normalized" tables (e.g. no whitespaces and/or at least, no multiple line values). But it's not working with real world data, because so far I could not find a way of determining the location (x, y, width, height) of a multi-line value. Is this even possible with PDF Box? Some people suggested to convert the pdf to html first and then to parse the html instead. Is this a viable option? Has anyone worked with this library? I will try to use this next.
Like I said in my previous comments, I have found a partial solution for my issue. This is based on two things:
First, I assume that one column for each table contains only distinct values which never occupy more than 1 row.
Next, since I also have some fixed texts in the document, I have determined these texts coordinates and use them as a delimiter of the area which contains the text I want to extract. For example, the "current, next, bottom" system (as I call it) can contain for example: "Column name A", "Column name B", "Fixed text C" (or second row from the same table - determined based on the unique single-row values).
It is not perfect, and problems may occur if the fixed texts may occur more than once in the document. Of course, improvements can be made by filtering the correct occurrence using the vertical coordinates and so on, but for the moment, I will close this question, as it seems that this problem has no standard answer and currently there is no open source library able to extract tabular data from pdfs.

Format specific part of Java code in Eclipse Mars

I am able to format my Java code by configuring it in Save Actions.
What I get after Saving the file is that the whole code gets formatted according to my settings. What I need is, that only a part of code gets formatted according to the settings.
Say, there are 10 methods in my code and I add one more to it. What I want is that only my 11th method gets formatted and the previous 10 gets untouched.
Is that possible? I am using Eclipse Mars
NOTE:
The format includes, removing unnecessary casts, parenthesis, etc.
Source code formatting on save can be limited to edited lines only:
The other save actions however are applied to the whole file.
Select the text you want to format, and press Ctrl + Shift + F to format the selection.
Alternatively you can do Ctrl + I on the selected text to just correct the indentation.
I always use CTRL + SHIFT + F to format my code. In eclipse you can just drag your mouse and select the code you want, then press CTRL + SHIFT + F. That way it will just format the selected code.
Normal the Ctrl+Shift+F will format the current class. Or when a selection is made, only that selected part will be formated.
To make a quick selection, from the current place, use Ctrl+Shift+Arrow-Up to make it bigger. Or Ctrl+Shift+Arrow-Down to make it smaller again.
Telling Eclipse "manually" which parts of your code to format and which not can be a daunting task. Especially if others work on the same project and don't know which parts of the code have to be left out.
Because of this Eclipse allows you to set "markers", simply surround your code with these tags:
// #formatter:off
Here goes your code
// #formatter:on
In order to make this work you have to adjust the Java Code Formatter (it can be reached via Window / Preferences) settings like this:
As far as I know this has been in Eclipse since version 3.6.

What's going on with CheckStyle's line length check?

I have CheckStyle set to check for lines over 80 characters in Eclipse, and I have a margin line set up in my editor at 80 characters. If I put my cursor at the end of a line of code in my editor, the co-ordinates read (for example) 1433, 77, indicating the 77th character column from the left - yet when I run CheckStyle over the same line it says the line is 88 characters long! There are no extraneous tabs or other whitespace characters at the end of the line, it's definitely 77 long. Is CheckStyle broken?
From the Checkstyle documentation:
The calculation of the length of a line takes into account the number of expanded spaces for a tab character ('\t'). The default number of spaces is 8. To specify a different number of spaces, the user can set TreeWalker property tabWidth which applies to all Checks, including LineLength; or can set property tabWidth for LineLength alone.
http://checkstyle.sourceforge.net/config_sizes.html#LineLength
To do this in Eclipse:
Open Window->Preferences from the Menu.
Select Checkstyle.
Type TreeWalker into the search box under Known modules.
Select TreeWalker on the list on the right.
Click Open.
Change the tabWidth to 4.
Click OK, and OK again.
Is CheckStyle broken?
Probably not.
I expect you/Eclipse and CheckStyle have a different idea of the width of a TAB character. It sounds you think it means 4 spaces, whereas CheckStyle thinks it means 8 characters (its default).
One way around this is to configure Eclipse to not use TAB characters in your source file, then re-indent your source files.
Another way is to make use that CheckStyle and Eclipse agree on the TAB width; e.g. see #MartinEllis's answer.
(I prefer the first approach because it means that my source code will look correctly indented, irrespective of the platform's default TAB width; i.e. Windows versus Linux/Unix. Hard TAB characters in source code are a bad idea.)

Why does my source code (written in Eclipse) look different in other text editors?

I've been using Eclipse to do my CS assignments, as recommended by my professor. However, I've noticed that if I open my source code in a different text editor, my beautiful, perfect formatting looks wrong. I believe the problem lies in tabs. A tab character seems to take up less space in eclipse than in other text editors.
A good chunk of our grade is determined by the neatness of our code. I'm not sure if our programs are graded in eclipse or not, so I'd like to figure out how to make source code have the same formatting regardless of text editor.
Is this a problem with Eclipse? Are there settings I can fiddle with?
This is probably due to your settings for the tab symbol. If you really want to indent using tab, make sure it is set to 8 spaces everywhere.
From the Java coding convention:
Four spaces should be used as the unit of indentation. The exact construction of the indentation (spaces vs. tabs) is unspecified. Tabs must be set exactly every 8 spaces (not 4).
Personally I always use spaces to indent my code due to the fact that some people have their tab symbol set to show as 4 spaces.
To set Eclipse to always use spaces, go to
Window -> Preferences -> Java -> Code style -> Formatter -> Edit
and set Tab policy to Spaces Only.
The tab size is probably different in other editors. It still is in its raw form a \t. Depending on what the editor will display when it encounters one is probably found in the preference. Either way it should be consistent in size all the way across the file.
You could also convert tabs to spaces so it is always the same regardless of editor.
The tab character does not have a defined display width. In notepad, it is displayed as the same width as 8 spaces wide. All code editors should allow the viewer to change the displayed tab width. The convention for code is normally a width equivalent to 4 spaces.
If you're desperate, you could replace all tabs with 4 spaces. However, this is frowned upon by some, and may lose you marks. I'm pretty sure that the advised java coding style advocates the use of tabs, not spaces.
At the end of the day, the marker is a fool if he'll turn down syntax highlighting, and the code editor that comes with it.

Categories

Resources