Is Web Service suitable for ETL purpose?

Is Web Service suitable for ETL purpose? - java

My company is considering using web service as mean of ETL process. However I don't think web service fit into this purpose, for several reasons:
1. web service could possibly consume a lot of memory when generating large xml.
2. xml is a bloated format.
3. possibly time-out if the server takes huge amount of time to generate data
4. file size limitation? (for windows, it's 2Gb, if my memory serves me right)
I am not a web service expert, so I need your opinions. :)
Thanks.

There are plenty of technologies in the Web Services tool shed that circumvent all the problems you elaborate. There is stream oriented XML shredding, there are XML compression formats for delivery, protocols that deal with fragmentation and fairness and there are many a storage systems that can hold terabytes upon terabytes of data.
If by web service you imagine some college freshmen homework concoction of an interface that accepts a single glop argument with a 2GB serialized table in it then all your arguments are valid. But if you give your requirements to an experienced team with knowledge of the concepts involved in WS-ReliableMessaging and WS-Transaction then there is no reason not to have an ETL process around Web Services. Note that I do not advocate the SOAP protocols per-se, but I do advocate knowledge and understanding of the concepts involved.
Now that being said, whether an Web Service oriented ETL process makes sense for you or not it depends on a whole set of other reasons. However, your rebuttal of the Web Service technologies does not hold water.

I would not use a web service for an ETL task. There are specialized tools for that task (e.g., Ab Initio, Informatica, etc.) that are better suited.
If you have a large amount of data, I'd say that the price of the extra latency that the network would introduce would be prohibitive.

It really does depend on what you are doing and how you are trying to accomplish it. In general webservices require more care and feeding than you would normally put into an ETL process, but they can be surprisingly effective at the task as well. I did not get enough specifics for your scenario to say whether it would work.
I have worked on Webservices which transmit and recieve 100+ MB documents, some encoded in XML some not, and do it in seconds (on a closed local network). These services required a good deal of tuning and planning, but they did work well for our scenario and they allowed a wide variety of clients to connect and transmit differing amounts of data through a fairly standard interface. This differed from some of the other ETL jobs we had were the job was specific to each client and had to be setup and maintained for each client.
It all depends on what you are doing and what your constraints are.
If you are going to pursue this route sit down and draft out the process from beginning to end, including how you want clients to connect, verify that the data was received and verify that the job is finished. Consider some of the scenarios, the clients and the types of data being transmitted and then work out what would be needed. Contrast that with what is already available in other tools, and how much time you have to get it done.

I'm really wondering why your company is not considering using a real ETL tool like like those mentioned by duffymo in his answer or, Talend or CloverETL if open source is an option.
They are in general good for ETL purpose :)
Building your own solution sounds like reinventing the wheel.
Many of them have web services oriented features (see Export a job as webservice in Talend's wiki or CloverETL Server HTTP Launch Services for example).
I'm not an ETL product expert and I didn't check them all but I'm pretty sure this is something to consider.

Look up MTOM, to start with, which allows arbitrary non-XML data to be streamed in a web service.

Web services are just fine for ETL tasks. Remember that each task is going to get handled in its own thread for free, and you're guaranteed proper cleanup between requests. Using web services inside something like Tomcat wouldn't be nearly as heavy as you think.
If you're concerned over the bloat of XML, consider JSON format.

Related

How to send/receive data to/from MetaTrader Ternminal 4 with JAVA (or anything!)

I have been working on an algorithm ( Not mine, I am just modifying it ) that predicts when to buy and sell on the FOREX market. I need to be able to open and close orders, dynamically update parameters of the orders ( such as stoploss, maximum stop etc. ) and receive real time tick data.
I have been researching for well over a week, and have no success.
The closest I have gotten is using JavoNet and Mt4 Api
I managed to import the DLL into java and use a MQL4 function, which was AccountBalance(), however this has returned 0.0, which was not the account balance, I messed around with the code and the settings on MT4 client but still no luck.
Q0: Can anyone please point me in the right direction?
I am new to automated FOREX trading but from what I understand there is a broker somewhere with a MT4 server and I connect to that server with my MT4 client on my windows machine.
Q1: If this is the case, do I need to make an API work with the server side instead of my client side?
All these DLL's I have tried so far have been used with the MT4 client software on my machine.
I have also been doing some reading on the FIX-Protocol and ZeroMQ.
Q2: Can these help me achieve my goal in any way (instead of creating some bridges between JAVA and MT4 DLL's)?

A0: yes, forget straight about REST and synchronous, blocking chains in FX-trading domain
A1: well, not a typical way. MetaTrader Server is a proprietary suite of systems on the Broker-side and theirs API are not disclosed to allow some 3rd party integrations against.
A2: FIX-Protocol is the industry standard LP-interfacing lingua franca. In case you have contracted relations with your institutional trading provider, incl. the FIX-Protocol GWY-port, this may provide you an A-level access to the Market and to integrate your trading tools against. If this is the case, forget about MT4 instrumentation, as prime-time cadences are far beyond the MT4 Terminal localhost processing architecture ( multiple events with a sub-millisecond TimeDOMAIN resolution are common, whereas MQL4 does not provide any direct support for multithreaded-concurrent / better parallel programme scheduling designs ). FIX-Protocol events are simply off-the picture above, being far left, "before" the graph starts from 1st [ms] column.
ZeroMQ may help liberate your further designs from MQL4 limitations. May like to read my other posts on distributed systems, where MQL4 / ZeroMQ / ML-AI-predictors / GPU-processing infrastructures appear.
Anyway:
Enjoy the Wild Worlds of MQL4/MQL5
Interested? May also like reading other MQL4, ZeroMQ distributed processing and low-latency trading posts

You can try MetaApi https://metaapi.cloud cloud service which provides REST API and WebSocket API access to both MetaTrader 4 and MetaTrader 5 accounts.
Official REST API documentation: https://metaapi.cloud/docs/client
SDKs: https://metaapi.cloud/sdks (javascript, python and Java SDKs are provided as per April 2021)
It supports reading account information, positions, orders, trade history, receiving quotes, and accessing market data.
The service also provides copy trading API https://metaapi.cloud/docs/copyfactory and API to calculate forex trading metrics on a MetaTrader account https://metaapi.cloud/docs/metastats.

I started to code an expert with MQL5, naturally on MT5 platform, and I must admit that the difficulty of managing the application along with the increase of its complexity is high. It's not only due to a missing garbage collector, that of course imposes the deletion of the new instances, but also because Java offers a set of powerful data structures and syntax that MQL5 naturally doesn't have. Last but not least, talking about the community and the third party libraries available, there's a light year of the distance between Java and MQL5. I.e. if I need to find a library for a JSON conversion on the Java side I find dozens of official and stable versions, in the MQL5 community I have found only rubbish that I had to modify myself.
So, after numerous failed tries on coding my expert in MQL5 (not a simple one of course), I decided to adopt a radical approach: coding an application, client-side MQL5, and server-side Java, that provides a Java facade for the MT5 platform. Same API, same basic events and so on. Even though I thought more than once that I was getting stuck in a blind path, I kept coding and eventually, I made it, obtaining a really solid result.
Naturally, the REST interface drastically reduces the performances, and each request, even with Tomcat and MT5 running in the same localhost, is in the order of milliseconds, not micros, but on the other side this reduces only the suitability of this architecture, it doesn't make it useless at all.
Strategies like scalpelling and every kind of high-frequency trading are not good for such kind of scenario, vice-versa every other strategy in the longer period, even if intraday's ones, can be implemented successfully without any cons.
Last but not least, it isn't necessary to use the WebRequest() MQL5 method to call any Servlet container, it is possible to import the wininet.dll from the OS (talking about Windows) and the strategy tester will work as if the strategy has been coded in MQL5, maybe just a little bit slower.
To sum up, I wouldn't be so sarcastic on the Java facade approach for the FX trading platforms, citing only the nude performances without contextualizing the overall scenario is a naive approach to face the argument.

If you need to send/receive synchronous message between MT4 and Java application, REST would be the best approach because fast response matters in this scenario. Message Queue solutions like ZeroMQ fits better in asynchronous solutions, so it won't help you. Once you choose REST approach, you can use MQL4 WebRequest() to call your Java application.

WebRequest isn't the end of the world, you can submit http requests from your EA using API, works even with Strategy Tester.

In order to collect the tick information and open, update or close orders, you can use mt4 server api.
please check this url.
http://mtapi.online/#overlappable-4
Maybe you will find what you want.
And then I have also mt4 server api. If you have any questions please update me.

How to use Typesafe Akka with Node.js

I have case that needs manipulate large stream of JSON and injecting it to Apache HBase. Our system works on Node.js with Mongo then, since we needs to enhance performance so HBase is choosen to handling the big data things.
To enchance my system scalability, I prefer using Actor Model by Akka for messaging instead any other messaging queue system. It's because Actor Model that Akka provide gives me any advantages about fail safe, Actor management, and other features that's very helpful to make my job done. But it still in JVM layer that directly injecting and consuming data from HBase.
I want my Node.js apps also works under Akka system maybe using node-java. Is it good practice? If it's not, is there any solution that Node.js can communicate with Akka?
ps. my question here is about how to works with Akka and Node.js, not arguing about "why choose to use Node.js when JVM has really fast JSON manipulating library?", it's because our system are already benchmarked and Node.js was the choosen one to handling JSON manipulation. Also it's already on production stage, so migration totally from Node.js to full Scala is not our priority today.

Just to clarify, Akka implements message passing as it's concurrency model and it supports Message Queue patterns (e.g. broadcast, pub-sub). However, you'd be better off looking at MQ solutions if that is really what you need.
I think going down the path you proposed (running NodeJs with Java interop) will yield little benefit whilst adding significant complexity for the long term.
Better to look for an answer from an architectural point-of-view.
If I had to decide, I would create a Scala / Java Akka microservice that sits between your NodeJs front-end and HBase. You can get a quick Proof of Concept running (which you can back out on relatively easy).
PS. If you are committing yourself to HBase, I would highly recommend you to also look into Apache Spark, which makes taming Big Data easier.

Huge data processing/ HPC in Java - suggest me how to begin

I am thinking to work on a programming problem for which, I suppose, I will need to know a lot of advanced programming concepts. For some reasons I have decided to code it in Java - even though I am not proficient in it.
So I want you to help me with suggestions, guidance, pointers to resources, books, tutorials or any generic advises that you think is pertinent.
Here is the basic nature of my problem:
I need to create a client-server architecture. Server supports multiple concurrent clients. Clients send it simple instructions (may be server exposes some kind of API/ runs listener on specific port), server executes the instructions and send result back to client.
The main job of the server is to do huge volume of data processing based on the instructions given to it. It takes data from backend database/ file systems. Data volume can easily surge up to ~ 200GB - 700GB. Data will be usually streamed to it, but it may require to hold huge volume of data in memory cache during processing (and if RAM is not enough, then page it to disk). Computations are generally numerically intensive in nature (let's say taking the inverse of a matrix)
The server should be able to do multithreading (I don't know what this term mean in Java, what I wish is, the server should be able to distribute the job in multiple parallel sub-processes.)
The server itself should be very lightweight. I Do NOT need any GUI Interface.
It will be great if I design it in a way so that I can integrate it later with HPC frameworks like Hadoop.
Now if I got to do this, what kind of programming do I need to learn? By the way, I have good understanding on OOP, I am somewhat familiar with Data Structures and algorithms, I know basic Java (never done any network or multithreaded programming in Java before, but have used typical oop concepts, generics, comparable interfaces etc.). I basically work in database programming, but have also done lot of C, C++, C#, Python in the past.
Given the requirement and my background, please suggest,
How should I begin to work on this project? What is the way to architect the project?
Should I create some basic API definitions first and then start working on the details?
Should I follow any particular design pattern? Where to learn them from?
What are the things I need to learn in Java and where to learn them from?
What is the best way to read huge data in memory? Is Java nio good solution?
If I instantiate a class with huge amount of data, would it work? (example, let's say I have a Vector class to represent a matrix with millions of elements and the constructor of the class reads huge data set in the memory). What's the best way to handle that?

You will want to define how the client and server will talk to eachother. The easiest way is to use established protocols such as HTTP by creating REST services that the client can call without much coding.
Most frameworks that support HTTP create several listeners that run in different threads. This gives you multi threading out of the box.
I'd suggest looking into I prefer Spring Controllers. Spring is fairly light weight.
If you want to use these frameworks, you will want to quickly find, and incorporate them into your application for compilation and packaging.
I would suggest looking into Maven for this. It's a big time saver. In particular using archetypes to create your project's folder structure, and auto download dependencies, and their dependencies.
Finally my words of wisdom. Ensure your services are singleton stateless services. This means you only create the objects once, and each thread uses the same objects. There is lots less garbage collection happening. This makes a huge difference when processing large amounts of requests.
Be careful not to use class level variables to hold state, in these services. If you do, different threads will over write each others data.

First thing I would like to say that as per your explanation of the things you seem to be in a pretty good shape to use java as your server side language.
The kind of client server architecture you choose may depend on what kind of clients actually you are serving to. Would they be typical GUI or CUI based desktop clients or the web clients.
In the latter case you could use Spring Framework in a normal fashion and for the former one you could go further to explore Spring's support for Restful Web services. I would advise not to go with socket or TCP based networking solutions or use java networking.
Spring's RESTful API gives you a very cool abstraction over things like networking and multi threading even for a desktop based client. In case of a desktop client you can use JSON/XML as response and can use HttpClient library for making calls to server, which is a very cool abstraction of the underlying networking stuff.
Further up Spring's design patterns follow a very linear flow of data. A lot of your fundamental design considerations are catered by the Spring itself using Dependency Injection and Inversion of Control which are extremely simple to incorporate.
For a detailed analysis of design patterns related to specific requirements I would suggest you to read the book called Java Design Patterns: A Tutorial of Addison Wesley publications and the author is James W. Cooper.
One more thing about the API design. It would be preferable for you to first create a API specification and then go further to implement them.

when web services should be used?

I have a web based enterprise application. One problem we are facing is that the backend is tightly coupled with our front end. Now we want to come up with mobile apps plus a desktop client for our software. For this we decided to move our backend to web services so that multiple front end clients can use the same function calls. But still wondering if this is a right move? or any other approach could help? Any pitfalls of using webservices? One thing that concerns me most is the speed. Are webservices inherently slower?
Appreciate your help.

Most web services (SOAP, REST) are used to create platform-independent services. What I'm trying to say is that the Web Service can be accessed by applications written in java, .NET platform, etc. without worrying about interoperability between the language use and technology.
Also, most have multiple applications that needs to access the same data, so writing a data abstraction layer for each application is not OO.
In your case, Web Services will be necessary as you will have plenty presentation layers (Desktop, Mobile and Web application) that will access your system through a network protocol. In this case, I would suggest a web service as you can easily write the business logic/rules inside the service and the client will just do a request/response process to get/post data).

It depends on the type of information your passing back and forth but in the broader scheme of things a SOA is the way to go if you looking to have one 'back end' system. I'm sure you'll be happy with the performance and even in some cases surprised. It most cases the database calls are still the bottle necks. I've lead the charge of implementing a SOA on a lot of projects. In a lot of cases the applications have preformed even faster with web services, not because the calls are that much faster but because it will lead you to better design and to take advantage of other technologies like local and remote caching.
I would say there are generally two pitfalls of using web services as the back end.
1.) How they are implemented. Read some books and do some testing before you start writing the real thing. Develop standards and try to stick to them.
2.) Single point of access is a good thing, but not in a unstable or quickly changing environment. Have some redundancy and a backup plan as well as a sandbox and staging area.

Depends on what you do understand with 'webservices'. SOAP, REST, all technologies like this?
SOAP services have the big advantage, that they have a well defined contract through the WSDL as well as the clients can easily generate stubs. On the other hand, SOAP services can also bring a lot more work (e.g. if using them on a client which has no soap client (e.g. iOS, plain html app)). Additionally the webservice stuff brings a lot of overhead, which could play a role if you intend to deliver large data e.g. to mobile devices.
There you must take into consideration that the clients could have limited bandwith (speed and data volume). Furthermore it must process the hole XML document, whereas a json could be more easier for example.

It does not degrade perfomance(speed).Because websevice is soap over http.I think ,In your case The backhand should be exposed as webservice,It would be feasible to invoke the service by multiple client.And more over you can introduce the extra security also if you need....

building a high scale java app, what stack would you use?

if you needed to build a highly scalable web application using java, what framework would you use and why?
I'm just reading thinking-in-java, head first servlets and manning's spring framework book, but really I want to focus on highly scalable architectures etc.
would you use tomcat, hibernate, ehcache?
(just assume you have to design for scale, not looking for the 'worry about it when you get traffic type responses)

The answer depends on what we mean by "scalable". A lot depends on your application, not on the framework you choose to implement it with.
No matter what framework you choose, the fact is that the hardware you deploy it on will have an upper limit on the number of simultaneous requests it'll be able to handle. If you want to handle more traffic, you'll have to throw more hardware at it and include load balancing, etc.
The part that's pertinent in that case has to do with shared state. If you have a lot of shared state, you'll have to make sure that it's thread safe, "sticky" when it needs to be, replicated throughout a cluster, etc. All that has to do with the app server you deploy it to and the way you design your app, not the framework.
Tomcat's not a "framework", it's a servlet/JSP engine. It's got clustering capabilities, but so do most other Java EE app servers. You can use Tomcat if you've already chosen Spring, because it implies that you don't have EJBs. Jetty, Resin, WebLogic, JBOSS, Glassfish - any of them will do.
Spring is a good choice if you already know it well. I think following the Spring idiom will make it more likely that your app is layered and architecturally sound, but that's not the deciding factor when it comes to scalability.
Hibernate will make your development life easier, but the scalability of your database depends a great deal on the schema, indexes, etc. Hibernate isn't a guarantee.
"Scalable" is one of those catch-all terms (like "lightweight") that is easy to toss off but encompasses many considerations. I'm not sure that a simple choice of framework will solve the issue once and for all.

I would check out Apache Mina. From the home page:
Apache MINA is a network application
framework which helps users develop
high performance and high scalability
network applications easily. It
provides an abstract · event-driven ·
asynchronous API over various
transports such as TCP/IP and UDP/IP
via Java NIO.
It has an HTTP engine AsyncWeb built on top of it.
A less radical suggestion (!) is Jetty - a servlet container geared towards performance and a small footprint.

The two keywords I would mainly focus on are Asynchronous and Stateless. Or at least "as stateless as possible: Of course you need state but maybe, instead of going for a full fledged RDBMS, have a look at document centered datastores.
Have a look at AKKA concerning async and CouchDB or MongoDB as datastores...

Frameworks are more geared towards speeding up development, not performance. There will be some overhead with any framework because of use cases it handles that you don't need. Granted, the overhead may be low, and most frameworks will point you towards patterns that have been proven to scale, but those patterns can be used without the framework as well.
So I would design your architecture assuming 'bare metal', i.e. pure servlets (yes, you could go even lower level, but I'm assuming you don't want to write your own http socket layer), straight JDBC, etc. Then go back and figure out which frameworks best fit your architecture, speed up your development, and don't add too much overhead. Tomcat versus other containers, Hibernate versus other ORMs, Struts versus other web frameworks - none of that matters if you make the wrong decisions about the key performance bottlenecks.
However, a better approach might be to choose a framework that optimizes for development time and then find the bottlenecks and address those as they occur. Otherwise, you could spin your wheels optimizing prematurely for cases that never occur. But that probably falls in the category of 'worry about it when you get traffic'.

All popular modern frameworks (and "stacks") are well-written and don't pose any threat to performance and scaling, if used correctly. So focus on what stack will be best for your requirements, rather than starting with the scalability upfront.
If you have a particular requirement, then you can ask a question about it and get recommendations about what's best for handling it.

There is no framework that is magically going to make your web service scalable.
The key to scalability is replicating the functionality that is (or would otherwise be) a bottleneck. If you are serious about making your service, you need to start with a good understanding of the characteristics of your application, and hence an idea of where the bottlenecks are likely to be:
Is it a read-only service or do user requests cause primary data to change?
Do you have / need sessions, or is the system RESTful?
Are the requests normal HTTP requests with HTML responses, or are you doing AJAX or callbacks or something.
Are user requests computation intensive, I/O intensive, rendering intensive?
How big/complicated is your backend database?
What are the availability requirements?
Then you need to decide how scalable you want it to be. Do you need to support hundreds, thousands, millions of simultaneous users? (Different degrees of scalability require different architectures, and different implementation approaches.)
Once you have figured these things out, then you decide whether there is an existing framework that can cope with the level traffic that you need to support. If not, you need to design your own system architecture to be scalable in the problem areas.

If you are able to work with a commercial system, then I'd suggest taking a look at Jazz Foundation at http://jazz.net. It's the base for IBM Rational's new products. The project is led by the guys that developed Eclipse within IBM before it was open-sourced. It has pluggable DB layer as well as supporting multiple App Servers. It's designed to handle clustering and multi-site type deployments. It has nice capabilities like OAuth support and License management.

In addition to the above:
Take a good look at JMS (Java Message Service). This is a much under rated technology. There are vendor solutions such as TibCo EMS, Oracle etc. But there are also free stacks such as Active MQ.
JMS will allow you to build synch and asynch solutions using queues. You can choose to have persistent or non-persistent queues.

As others already have replied scalability isn't about what framework you use. Sure it is nice to squeeze out as much performance as possible from each node, but what you ideally want is that by adding another node you scale your app in a linear fashion.
The application should be architected in distinct layers so it is possible to add more power to different layers of the application without a rewrite and also to add different layered caching. Caching is key to archive speed.
One example of layers for a big webapp:
Load balancers (TCP level)
Caching reverse proxies
CDN for static content
Front end webservers
Appservers (business logic of the app)
Persistent storage (RDBMS, key/value, document)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.