Composition of Grid-enabled Web Services for Integration and Sharing of Distributed Resources through Web based Interfaces
Niraj Kumar
Software Developer
Kolkata ' 700026.
West Bengal, India
E-mail: nirajkumariitkgp@gmail.com
Contact No: (Mobile).
© 2006 Niraj Kumar. All right reserved.
ABSTRACT
Traditional computer architecture and integration mechanism are more biased towards
tightly coupled client-server architecture and centralized databases. However with
phenomenal growth in web technologies and emergence of Web as world biggest
database has pushed human and organizations ability to utilize these resources effectively
to limits. Computer scientists, researchers and organizations throughout the world trying
to develop mechanism to make effective utilization of these distributed and
heterogeneous resources to gain competitive advantage in market. In this study, we
propose to develop a grid-enabled web services through web based interfaces, which may
considerably enhance our ability to share distributed and heterogeneous resources and
services among the institutions and organizations throughout the world.
Key Words: XML, SOAP, UDDI, WSDL, Web Services, Grid Computing, Java.
THE PROBLEM
To fulfill challenges posed by competitive world there is need to develop a system, which
can Integrate various distributed databases and web resources on the WWW and bring
these heterogeneous sources of data into common platform or in the form required by the
user. Also there is need to provide mechanism to allow sharing of resources and services
among various Organization / Institutions. Clients through portal interfaces should be
able to get required information on real time basis by seamlessly integrating various
resources/services spread across WWW and various Organization/ institutions and hiding
its implementation details from the user. Required system should be highly flexible and
scalable and services should be added and deleted at any time without affecting the
2
performance of the system. This required that Client and Servers must be independent of
each other.
Introduction
The organizations/ institutions of todays need to share and integrate various resources and
services to provide task specific requirements of a particular users and applications.
However, this is very challenging task because each of these resources and services have
different structures, contents, query languages and retrieved data in different format and
supported by different underlying hardware and network support. Furthermore, they are
prone to having their interfaces and formats updated without warning. Due to increasing
complexities of problems any user specific task typically requires to interact with
number of resources and services distributed over geographically distant locations and
under the control of different agencies. Many of them have security concerns and want to
share only limited resources with others. They also need flexibility in adding or deleting
resources and our proposed system should able to run even if some resources and services
will be no longer available for use or new resources and services get added to the system.
This requires that machines must be able to communicate with one another without much
human intervention.
Our current web applications follow the traditional client-server model of software
architecture and they are closely interrelated with one other, inflexible and tightly
coupled. This makes whole system dysfunctional if any changes is done in client or
server architecture independently. Also, our traditional database and information
integration systems (like Enterprise Resource Planning) are biased towards centralized
system where various resources of the organization are integrated using centralized
databases and mostly with proprietary software. This is contrary to the spirit of the Web
- a loose, open confederation of resources held together by simple protocols. So, to
address these problems many distributed environment technologies and standards like
DCOM, CORBA, RMI etc were came up during late 1990's. However, in reality these
are not suitable for the Internet, and introduce a degree of dependency and/or platform
issues. They are not able to completely eliminate need to write client application without
having to know anything about the architecture of participating distributed objects. In the
meantime advances in Servlet and Java Server Page (JSP) technologies and emergence of
standard like J2EE and .Net from Sun Microsystems and Microsoft respectively made
possible development of fast and relatively secure Web based application within the
realm of reality.
This set the stage for XML based Web services, which is an exciting new technology
standard that enables communication between heterogeneous computer systems. Web
services emerged as standard only in last 3 years. At its core, the technology is simply
XML moving from one computer to another in a form that each computer can reliably
process. It is a significant improvement to traditional systems integrations and it has
significant implication for any organization. Web services facilitates the ability to expand
computer to computer communication. They are developed supporting three primary
3
internet standards - Simple object access protocol (SOAP), Web Services Description
Language (WSDL), and Universal Description, Discovery, and Integration (UDDI)
directory. Currently these standard were supported by all major software technology
vendors and approved by the W3C. Grid based computer applications can be considered
as next level in this chain of events, which make possible even heavy duty task solvable
by using diversified and heterogeneous resources and services.
In this case study, we propose to develop a XML-based Web services and Grid based
solution using various Java based technologies, which can facilitate transfer and sharing
of various resources across the Organizations/ Institutions of the world through Web
based interfaces on the portal. We also want to make it possible to share and transfer not
only light weight resources and services, but also heavy weight resources and services.
Our aim is to develop a full fledged independent software product and methodology,
which can be directly usable to any portals which needs to share and integrate resources
over the Internet or among the enterprises and their partners and can be marketable as an
independent product in its own right.
GRID COMPUTING AND WEB SERVICES FOR THE
FUTURE
Imagine a scenario where just with an interface anybody will be able to run any program
without downloading any softwares or all barriers of platforms, databases and
networks vanishes . Grid computing system of the future should able to provide solution
to these problems. It is also expected that autonomic computing and smart network
technologies should emerge which should automatically able to detect changes in the
systems and accordingly able to take appropriate action. It is likely to provide user
friendly interfaces for remote job monitoring and show the status of the result computed
at each nodes in real time. More and more web based interfaces is likely to be added for
all activities related with grid computing. The primary reasons for these are because grid
systems require dynamic discovery and composition of services in heterogeneous
environments necessitates mechanisms for registering and discovering interface
definitions and endpoint implementation descriptions and for dynamically generating
proxies based on (potentially multiple) bindings for specific interfaces. WSDL supports
this requirement by providing a standard mechanism for defining interface definitions
separately from their embodiment within a particular binding (transport protocol and data
encoding format). Second, already numerous tools and support for WSDL processors
that can generate language bindings for a variety of languages and platforms. Third,
using http protocol for communication allows to communicate with any system sitting
behind firewalls as usually firewalls don't likely to block this port. Fourth, because any
computational intensive tasks require to interact with more than one computers at
geographically distributed locations, databases etc which requires algorithms for defining
work flows. Web Services Flow Language (WSFL) and MS XLANG, which is an XML
language to describe workflow processes among distributed and heterogeneous
environment offers excellent potential for this purpose.
4
The web services framework is likely to integrated with the grid computing system of the future
and already attempts in this direction is being made, however it is yet to matured to stage where
web services potential can be harnessed for grid computing fully. As
grid computing hasstarted to leverage Web services to define standard interfaces for business services and
institutional needs. The grid is likely to provide virtual integrated environment to people
from different organizations and locations to work together to solve a specific problems.
This is a typical dynamic resource sharing and information exchange. The grid
computing platform is likely to allow resource discovery, resource sharing, and
collaboration in a distributed environment in more user friendly ways.
Future generation Grid enabled Web services should be able to accomplice the following
tasks:
•
The ability to more efficiently use computing power. Jobs can be sent to the nodethat has the least amount of load.
•
Complex jobs can be broken up and run on multiple nodes in parallel, providing asignificant performance increase. This kind of structure is known as a
computational grid
.•
Large amounts of data can be stored in a structure that spreads over manysystems, yet still be accessed as if they were part of a single node. This structure,
similar to a federated database, is known as a
data grid.•
The ability to run different parts of an application on systems with differentcharacteristics. However, any grid system requires that user specific to their
requirements and problems
should submit the appropriate input files and define the problems algorithm in suitable
languages. This requires considerable domain expertise in the problem areas as well as
understanding of the processes involved in grid systems to able to efficiently use it.
Using SOAP for Communication in Grid environment
We need to develop a mechanism to send and receive communication to remote services
and resources in grid environment. Web services expose objects method via SOAP.
Following steps needed to be followed:
•
The client application builds a SOAP message, which is an XML document capableof performing the desired request/response operation.
•
The client sends the SOAP message to a JSP page on a Web server listening SOAPrequests.
•
The SOAP server parses the SOAP package and invokes the appropriate methodand object in its domain, passing in the parameters included in the SOAP document
•
The request object performs the indicated function and returns data to the SOAPserver which packages the response in a SOAP envelope. The server wraps the SOAP
enveloped response object, such as servlet , which is send back to the requesting
machine.
5
•
The client receives the object, stripps off the SOAP envelope and send the responsedocument to the program originally requesting it, completing the request/response
cycle.
Managing Work Flow in Grid
Once the resources are discovered, Work flow in grid can be established using Web
Services Flow Language (WSFL) and MS XLANG, which is an XML language to
describe workflow processes and spawn them. WSFL specifies how a Web Service is
interfaced with another. With it, we can determine whether the Web Services should be
treated as an activity in one workflow or as a series of activities. While WSFL
complements WSDL (Web Services Definition Language) and is transition-based,
XLANG is an extension of WSDL and block-structured based. WSFL supports two
model types: flow and global models. The flow model describes business processes that a
collection of Web Services needs to achieve. The global model describes how Web
Services interact with one another. XLANG, on the other hand, allows orchestration of
Web Services into business processes and composite Web Services. WSFL is strong on
model presentation while XLANG does well with the long-running interaction of Web
Services. Web Services and resources can be declared as private or public.
Monitoring Remote Jobs in Grid environment
In a complex system like the grid, monitoring is essential for understanding its operation,
debugging, failure detection and for performance optimization. The monitoring system
must be able to provide information about the current state of various grid entities, such
as grid resources and running jobs, as well as to provide notifications when certain events
(e.g. system failures, performance problems) occur. Monitoring jobs require
interoperation between the monitoring system and other grid services. The running
application consists of processes running on hosts constituting the grid resource.
Processes are identified locally by the operating system by process identifiers
(PIDs). The local resource management system (LRMS) controls jobs running on hosts
belonging to a grid resource. It allocates hosts to jobs, starts and stops jobs on user
request and possibly restarts jobs in case of an error. It may also checkpoint and migrate
jobs between hosts of the resource which can be considered as a special case of job
startup. The LRMS identifies the job it manages by a local job identifier (LJID). To
monitor a job the monitoring system has to know the relation between LJIDs and PIDs.
There are various ways to accomplice this task and each grid system implements this in
different ways. The future generation grid systems should provide job submission, job
monitoring, job status and job output through web based interfaces to make it accessible
to common man.
6
System Architecture, Design and Modeling
Our architecture focuses on providing virtual integrated environment among
organizations/institutions through web based interfaces, which are easy to use, flexible
and secure. It provides mechanism to share large number of resources like video
lectures on real time and on demand, scientific databases, query to partner institutions,
lecture notes etc though our primary focus is on providing computation resources for
computation intensive scientific and engineering tasks. Our architecture aims to provide
support for platform independent and heterogeneous resources.
In the beginning our focus is on providing support for Java, C++/C and Fortan languages.
For scripting we intends to provide support for PERL, Shell Script, Dos Script. As far as
operating system is concerned we intend to provide support for windows and Unix based
platforms. To make this possible our architecture is XML based and intend to combine
web services concepts with grid computing concepts. Our architecture is based on
autonomic computing concepts and also intended to integrate intelligent network
technology concepts like Jini Technology with grid computing to make it possible for us
to able to dynamically sense changes in network environment and system should able to
take appropriate action automatically. Our architecture is capable of taking into
consideration any number of systems and services added/removed from the system in
real time. Besides it is capable of displaying status of jobs, output of the jobs at each
nodes in real time.
The job is submitted by the client to the web portal through a graphical user interface
(GUI). The web portal delegates the management of jobs to schedulers. A scheduler
divides a job into smaller tasks (in the case of an independent job, a task refers to the
subset of parameters that can be executed independently) and sends the tasks to the
resources for execution. Ease of use is achieved by encapsulating the system with an easy
and a well defined interface. The execution service provided by the resources is wrapped
inside a web service interface, hence it can be consumed easily by any user. The
scheduler encapsulates the complexity of the job scheduling into a web service interface.
This approach of using web service interface allows easy client side implementation. The
web portal provides GUIs for job submission and management, hence allowing the client
to submit and monitor jobs easily. The parameter file can contain a range or a list of input
values. The scheduler parses the parameter file and splits the input values so it can
distribute the job to many resource machines with different range or list of input values,
which is a subset of the submitted input values.
The status of the each subparts of the job submitted to each nodes is displayed through
web based interface at real time. It is also proposed to display the status about
performance of each nodes with respect to particular job to the respective clients through
web based interface. It will also display the available memory at each nodes, current load
at each nodes, average CPU performance of each nodes. After computation is complete
the final result is generated and displayed in real time basis through web based interface
and automatically an event is generated to send output to the clients through email or to
the computer directory of the client.
7
Simplified view of integrating two organizations resources for Grid
Communication Protocols(like HTTP/ HTTPS etc)
Information flow
Between two institutions
(XML over HTTP/
XML over FTP/
XML over HTTPS)
Data Data
CyberSWIFT Partner Organizations
Web Server and
Application Server
Database Server
Web Server and
Application Server
Database Server
WWW
Web Portal
Other resources
(Computers, Video
Lectures, Files etc
Other resources
(Computers, Video
Lectures, Files etc)
8
Grid enabled Web service Dataflow Diagram
Result
Find the Appropriate Nodes
Transfer the Data
Submit the Jobs
Collect the result
Web Portal
Client 1 Client 2 Client 3
HTTP
request
Authenticate with the
Server
Connect to the Server
Transfer the File
(XML over FTP)
Close the Connection
Integrate the result
9
Proposed Design Methodology
Design methodology to be adopted for this study is in stages. Each stages can be
considered as separate software modules. Primary challenges when designing Grid-based
Web services are to look beyond the traditional Client- Server paradigm, where client is
tightly coupled with server. Here we have to design client completely independent of
server, so that even if some changes take place in the server side (In this case which is not
in our control as many services can be withdrawn, while many new services can be added
without our knowledge). Then on server side, this system must able to give choices to the
other partner institutions regarding type of services and resources they want to share and
at any point of time able to withdrawn as well as add new services/resources. So this
kind of system design should be based on following design principles:
•
Clients and server applications should be independent from one another•
Applications should be built by discrete components coordinating serverbasedmodules.
•
Services and resources should be discovered by querying directories.•
Services should be transient•
Services should support extension and able to degrade when no longerneeded.
•
A mechanism to describe the Services (Example: WSDL implementation)•
A mechanism to communicate with services (Ex: SOAP implementation)•
A mechanism to submit the services and resources in registry (Implementationof UDDI)
•
A mechanism to discover available services and resources (Implementation ofUDDI)
•
A mechanism to break the complex jobs into simple jobs to be submitted toavailable and least loaded nodes. Depending upon the changes in the load of any
nodes it should capable of automatically redistribute the jobs to least loaded
nodes.
•
A mechanism to display the status of jobs submitted at each nodes in real timewith related statistics about memory status, CPU performances and uses and
changes in them at real time. It should also display the output displayed by each
nodes.
•
A mechanism to send back result to the client (Using SOAP)Keeping into consideration these guidelines the system design stages should be
following:
STAGE I:
In this stage we propose to design user interfaces keeping into view that clients should
able to minimize the number of pages needed to be viewed and this should be
independent from the available services and resources.
STAGE II:
10
In this stage we have developed elementary and small grid computing systems for
elementary computations. We have evaluated various available grid computing systems
like Globus, Unicore, Candore, JCGrid, Jgrid, Optimal Grid etc. We also propose to test
all these grid systems by running sample jobs among number of windows and Unix based
environment. Idea is just to make a suitable decision about which of these grid based
systems are based suitable for our purpose. Here we need to take into account computing
facilities available or likely to be available in the future for this purpose, Which kinds of
operating systems all these runs, technological level of people involved in this process
and their commitment, type of network facilities available and security, firewalls and
other administrative decisions about how to allow access to our facilities to the clients as
well as partner institutions. Also we need to take into consideration all these
environments of our partner institutions and general state of affairs and understanding
about grid based system in the country, its current and future potential, and requirements
for such a system .
In this stage we also propose to develop web based interfaces for this kind of system for
job submission, job monitoring, output condition monitoring, output displayed specific
for the grid system we plan to adopt considering our suitability and capabilities. Once this
interface come up user should start the server and submit their jobs . Similar interfaces
will be provided to the partner institutions with the difference that no public interfaces
should be provided but only to transfer their resources to IIITMK or to submit the
services/resources to the registry.
STAGE III:
In this stage we implement the logic of web interface implementation using web services
concepts and suitable grid system. We also customize that system according to our
requirements and feasibility. We also combine computing power of various PCs in
windows and Unix platforms available. Make it fully operational and providing facilities
for remote jobs and status monitoring. Provide single web interface to submit any
computational intensive jobs. We also provide support for various available languages
and scripts as well as integrate the whole system with partner institutions and industry.
We also implement autonomic grid computing concepts and intelligent network concepts
with suitable technologies to see that our system should able to withstand the requirement
of future generation grid systems. We also optimize our whole system to get best possible
throughput and CPU utilization of available nodes for the purpose in geographically
distributed locations. We also aim to enhance our capabilities to apply web services
concepts for grid computing purpose. We also train people and students about these
technologies and systems and considerably enhance our abilities to use this system with
maturity.
STAGE IV:
In this stage we develop web based interfaces for many other grid computing systems as
well as continuously improve and upgrade our system as newer technologies for these
11
will be developed. In this stage we also try to make grid computing technologies
available to common man which requires very less technical knowledge of computers.
As future systems are likely to develop monitors of different kinds and capabilities in
terms of memory requirements, CPU capabilities, we aim to provide grid facilities
through all those devices. We also look into possibilities of developing cutting edge
technology and products in these areas as well as developing some algorithms for
defining work flows, job monitoring etc in the distributed environment. We also try to
look into developing the possibility of system where any body without downloading
softwares can able to use these softwares by sending appropriate program to them.
Summary and Conclusion
In this study, we have presented a framework for sharing distributed and heterogeneous
resources and services among the organizations/institutions. We have also presented
various modules of our framework. We have overviewed some current grid system
available and their usefulness through sample case studies taking into account strength
and weakness of the organizations. We have implemented sample SOAP, AXIS and Jini
based web services and made it available through our web based interfaces. We have
empathizes the importance of combining autonomic, intelligent networks, and web
services concepts with current grid computing systems to make it more effective,
efficient and reachable to the common man.
In summary, a simple grid computing system combining the power of Web Services,
autonomic computing, and intelligent network concepts with capabilities of combining
the computing power of twenty ' thirty computers of different platforms through web
based interfaces, which may be geographically distributed and hiding the complexities
of the implementations from its user, is recommended as grid computing system for the
future.
Challenges and Potential Research Directions
In this section we try to point out potential challenges and future research directions of
this study in stages.
•
As discussed in the assumptions, publicizing grid-enabled web services through webbased interfaces may raise security concerns. For example, if they are open to anyone and
everyone, the hackers and malicious users can overload the system by submitting dummy
jobs. We may think of providing these facilities through HTTPS or to develop different
level of security mechanism for such a system.
12
•
Developing browser plug-ins specifically for grid computing purpose is anotherinteresting areas which requires our attention.
•
Web services concepts has made significant advancement in last few years, however itsfocus so far is only towards providing light weight services. However, we can think of
providing services of the kind where we don't need to download and install any software
to use it, but by requesting with appropriate input files anybody should be able to use it.
We can think of developing some these kind of services in future.
•
Developing pricing mechanism for making available these kind of services is anotherimportant potential area, which can be explored.
•
Developing domain specific tools for using these kind of grid system in optimal way forexample drug design, earth sciences, bio-informatics etc are another potential area which
required further exploration.
REFERENCES
(1) IBM Developer Works
http://www-128.ibm.com/developerworks/webservices
(2) IBM OptimalGrid
http://www.alphaworks.ibm.com/tech/optimalgrid
(3) IBM TSpaces
http://www.almaden.ibm.com/cs/TSpaces
(4) JCGrid Website
http://jcgrid.sourceforge.net/
(5) PovRay Website
http://www.povray.org/
(6) Foster Ian, "What is Grid? A Three Point Checklist", Argonne National Laboratory &
University of Chicago, 2002, PP: 1- 4 .
(7) Foster Ian, Kesselman Carl, Nick J., Tuecke, "The Physiology of the Grid - An Open
Grid Services Architecture for Distributed Systems Integration ", OGSA draft documents,
Version: 6/22/2002, PP: 1-31.
(8) Balaton Zoltan, Gombas Gabor, "Resource and Job Monitoring in the Grid", MTA
SZTAKI Computer and Automation Research Institute 2003, PP: 1-8.
13
(9) Tantra J. W., Thu M. M., Heng F. C., "A Framework for secure execution of java
jobs in grid computing", Executive Summary, 2004, PP: 1-7.
(10)Jgrid Website
http://pds.irt.vein.hu/jgrid_index.html
](11) Jini Network Technology Website
www.gini.org
(12) Sun MicroSystem Web Services
http://java.sun.com/webservices
(13) Apache Website
http://apache.org
(14) Globus Grid Computing Site
http://www.globus.org/
(15) Unicore Grid Computing Site
www.unicore.org
(16) Gridbus, Australia Site
www.gridbus.org
(17) MIT Thesis Web Site
http://theses.mit.edu/
(18) MIT Open Course Site
http://ocw.mit.edu
(19) Java World Site
www.javaworld.com
(20) Apache Tomcat Site
http://tomcat.apache.org
(21) Apache Axis Site
http://ws.apache.org/axis/
(22) Apache Soap Site
http://ws.apache.org/soap/
(23) VideoLAN Project site
http://www.videolan.org/
(24) Streaming Media World Site
http://streamingmediaworld.com
14
(25) OpenSSH Website
http://www.openssh.com/
(26) Marty Hall CoreServlet Site
www.coreservlets.com
(27)W3 School Website
http://www.w3schools.com
(28) IIT Kharagpur E-library Site
http://www.library.iitkgp.ernet.in/
(29) IIT Kanpur Online Thesis Site
http://www.iitk.ac.in/
(30)Computational Chemistry portal, IIITMK
http://comchem.in
(31) W3 consortium Site
http://www.w3.org/
(32) Professional XML, Wrox Press, 2000, PP: 797-835.
(33)Microsoft Website
http://www.microsoft.com/
(34) Kumar Niraj, Srivathsan K. R., "Enterprise risk evaluation and continuous
mitigation using the Fuzzy-Multi-attribute decision making ' A conceptual approach",
Under review by IISc Journal, 2005, pp-1-25.
(35) Kumar N., Bhattacherjee A. and Sarkar D., " Performance appraisal of coal mines
using Data Envelopment Analysis and Fuzzy Set Theory", Mintech, 2002, Volume 23,
No. 5, pp. 18-25.
(36) Kumar N., Bhattacherjee A., Chakravarty D. and Sarkar D., “Efficiency
measurement of mines using DEA and AHP”, TAMSEM, I.I.T. Kharagpur, February,
2004.
(37) Biomer Website
http://www.es.embnet.org/Services/MolBio/B/
(38)Reddy J. N, “An Introduction to Finite Element Method”, McGRAW-HILL
International Editions, 1993.
(39) Condore Grid Computing Site
http://www.cs.wisc.edu/condor
15
0 Responses
Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.