modify pbit data source in a program - java

I am trying to modify a pbit archive in java in my application.
The point is to update the data source of a pbit without using any PowerBI application so I have to modify the DataModelSchema entry.
My problem is : when I read the file with an InputStream and display it in the console, there are blank spaces added between each letter so I am not able to search and replace the right string. Even if I add artificially blank spaces in my string.
For example if i search "content" or even "c o n t e n t", it never finds it.
That problem never appears when I read "normal" .zip archives.
An overview of my ouput when i read the file with additional blank spaces :
https://community.powerbi.com/t5/Service/modify-pbit-data-source-in-a-program/m-p/1744747#M124339
So I would like to get to know if there is a special encoding in pbit templates which might add special spaces or whatever if there is a way to read it properly.
Thanks for your help,
Regards
J.MARQUE

Just to clarify if someone get the same problem: that appears that it is not possible (for now) to update directly the content of the data schema of a pbit template, the only way seems to use Power BI REST API.

Related

Extract flat files contents into individual words and store into database

I've done a lot of internet searching to find some information to no avail.. Hopefully you can help me..
I want to be able to use a flat file, with normal content (i.e. full english sentences, paragraphs etc), extract each word and store each word individually, one word per row, in a SQL database (doesn't matter if there are spaces but characters such as apostrophes can be kept in)
I then want to have a HTML page with code to access this DB and output the text to the user one word at a time, essentially 'writing' the inputted files text word-by-word on the web page.
This is just a coding exercise but I am frustrated as I know the what but not the how.. I am not sure where to start. Please note some of these files can be quite big ~ 20,000 words so there may be a performance element to consider to any solution.
TL;DR: I want to extract individual words from a text file with normal everyday sentences into a SQL DB that I can retrieve from a HTML page.
Simple read & split exercise
with open(<filename>) as f:
dd = {}
for ln in f:
wds = ln.strip().split()
for word in wds:
dd[word] = 1 # need something for value
for wkey in dd:
<insert into db>
Well, before you start you should choose just one programming language. Since you seem like you are a beginner I would highly recommend Python over Java, but it depends on if you're required to use any particular language by an employer/professor/etc.
Also just to point out, this is also a very BIG task that you've chosen. I'll try to break it down into parts for you, but I recommend starting with just one of these parts before you move on, and make sure it works on your local machine before you try putting it on the web.
First you need to use something read in your file, preferably line by line. A method similar to FileReader/BufferedReader in Java or the open(), readlines() functions in Python will do these. I would also check out the tutorials online on file handling for whichever of these two languages you're going to use. The Python one is here. Practice this with a test file or a small section of your real file before you start working on your real input files.
When you start processing the lines from the file, I would recommend splitting them into individual words using a string split function on spaces or on any punctuation, such as ,.!?". This way you'll pull out the individual words from the each line in the file.
Next, you'll want to choose a database API for the appropriate programming language. I used PyMySQL but there is also MySQLDB for Python. In Java there is JDBC.
You'll need to then build your database on a server somewhere, preferably on the same server as your HTML page for ease of connection. You'll want to practice connecting to your database and adding sample rows before you start trying to process your real input files.
You can't have normal HTML access the database directly - you'll need to use a coding language like Python for that. I've never used Java for webpages, but with Python you'll simply output text and tell the server to display it as the webpage. This will do the trick:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import otherstuffhere
## Must have this header to tell browser how to handle this output
## and must be printed first
print ("Content-Type: text/html\n\n")
## Connect to database here
## Your code to display words from the database goes below here
print (myfield1)
Also remember that when you output your text, you'll need to add all the HTML tags to the normal text output. For example, when printing each word, you'll need to add <p> or <br> to end each line, because although the Python print() function will automatically add a line break, this doesn't translate to a line break in HTML. For example:
print ("My word list is: <br>")
for word in dbOutputList:
print (word)
print ("<br>")
After that the REAL fun/crying begins, but you should work on the above before you move on.

SAX Parser, how to dynamically ignore & in input xml file for SAXParser.parse

I have a scenario where I get an XML from another service and I parse this file and render it to another file.
But, sometimes, we get & in the input file inside any tag and when we try to parse this file we get SAXException.
Is there a way we can dynamically replace &, or we can ignore the & sign while parsing?
After doing a bit of research I have come up with following points:
SAX Parser needs a clean XML file without any error, else it will fail and we cannot change characters dynamically in the input. So, we need to check the Input XML file before hand.
To change characters in the input file with ease use "StringEscapeUtils.escapeXml" provided by Apache in "org.apache.commons.lang.StringEscapeUtils" package. But, this to has its downside, as it will all the occurences of the character. For reference you can check this blog: "http://javarevisited.blogspot.com/2012/09/how-to-replace-escape-xml-special-characters-java-string.html"
But, my use case scenario was different, I need only particular character to be deleted from the input file. So, for that I had to code from scratch; I had to read file and check for the desired character to delete and delete it and write back to the file again.

ORACLE BPEL Adding whitespace after newlines

As part of a larger task we need to extract information from fixed field length text files. The data files were original developed for EDI but are in widespread use through the industry so we can't ask for a more modern way of encoding data.
I wrote a Java program, implemented as a user defined function, to do the file parsing. It works properly when run locally within jDeveloper. One of the things it does is remove all the newline characters so I can count record lengths accurately. When I try to run it from a simple composite application on the SOA server I find a strange problem: the newlines are gone but I get a new whitespace character in their place.
I do not know what can cause this or how to reliably deal with it. My composite is very simple; I just paste the file content into the "input" field using the test console on the SOA server and this sends the string to the user defined function which parses the file and outputs an XML fragment I can then read.
If I manually strip all the newlines and paste that in, all is well and it works fine but if I send a the data with newlines I get extra whitespace.
Is the composite trying to normalize newlines for me? If so; I would like to know how to make it quit.

How can I output data with special characters visible?

I have a text file that was provided to me and no one knows the encoding on it. Looking at it in a text editor, everything looks fine, aligned properly into neat columns.
However, I'm seeing some anomalies when I read the data. Even though, visually, the field "Foo" appears in the same columns in the text file (for instance, in columns 15-20), when I try to pull it out using substring(15,20) my data varies wildly. Sometimes I'll pull bytes 11-16, sometimes 18-23, sometimes 15-20...there's no consistency between records.
I suspect that there are some special chartacters, invisible to my text editor, but readable by (and counted in the index of) the String methods. Is there any way in Java to dump the contents of the file with any special characters visible so I can see what I need to Strings I need replace with regex?
If not in Java, can anyone recommed a tool that may be able to help me out?
I would start with having a look at the file directly. Any code adds a layer of doubt. Take a Total Commander (or equivalent on your platform), view the file (F3) and switch to hex mode. You suggest that the special characters behavior is not even consistent between lines, so you should get some visual clue about the format before you even attempt to fix it algorithmically.
Have you tried printing the contents of the file as individual integers or bytes? That way you can see if there are any hidden characters.

Can't filter filenames in Talend Open Studio using regular expressions

This question is about Talend Open Studio code.
I use tSendmail component as a child job, that needs to be run when parent job fails (tFtpPut). However, in tFtpPut, file names are filtered by filename masks (for example, it will upload file named Eedoh, if I put Ee* as a mask), but in tSendMail that's not the case.
I understand that tFtpPut uses special characters from filesystem to make filename masks, and tSendMail should use Java regex. Problem is (as I saw in the source code), List.add(String) function is used to add filenames, so I can not use regex as parameter in .add function.
So, I need way to upload all files with names that match regular expression.
Btw, I have tried to change the source code (tried iterating the whole folder and adding all files whose names matches the regex), but it didn't help, an error ocured somewhere else and I was not able to track the issue.
For that problem, I would create a regexpr filter before the components (FTP and sendMail).
It is very easy with a tFilterRow component in "advanced mode". Your filter condition is inputrow.filenamefield.matches("java_regexpr").
This external filter is the same for both components and you don't have anymore to use the specific filter of the FTP component.

Categories

Resources