Category: software

Tag Services for Atlas

My presentation on the Atlas Software Workshop:
tagservices

Atlas TAGs

I’ve presented two talks in the Event TAG Developer Workshop in CERN:

Extract and Skim Tags via Athenaeum

The public service interface is implemened by tagExtract_trf.py script, which is linked also to tagSkim_trf.py script. Those scripts run on different ports, but that is only formal organisation (clients can access both servers in the same way).

Servers can be called

Skim extracttags

(Other Athenaeum posts)

New extract command

New command is available to perform extraction with the Event Tag database from the command line:


source /afs/cern.ch/sw/lcg/external/Java/bin/setup.sh
extract
	-manager [CERN|CHICAGO], default = CERN
	-python <Python options file>
	-url <worker ip:port>, default = http://lxvm0341.cern.ch:10001
	-key <insider key>
	-output <output Root file>, default = test_<random>.root
	-query <sql query>
	-collname <collection name>
	-lumi <luminosity>, default = Unknown
	-release <release>, default = Atlas,takeFromEnv
	-conn <connection string>
	-target <target directory>, default = /tmp
	-atts <requested attributes>, default = RunNumber,EventNumber
	-email <notification email>
	-lumiencode <is luminosity encoded ?>, default = False

The user can choose the (Athenaeum) extraction manager (-manager) and the extraction worker (-url). All options can be collected in a Python options file (-python). Other formats of options file will soon follow. The options are taken as they appear on the command line and the options files. That means that an option file will overwrite all command line options set before, but its options will be overwritten by command line options set after.

When performing tag extraction with ELSSI, the job notification letter contains the command line to reproduce the same job.

other Athenaeum posts

New Histogramming service for Atlas Tag database

There is a new histogramming service for Atlas Tag database. It is now completely independent service hosted on the CERN J2EE server. It can be called from ELSSI or from Web. The service talks directly to the Oracle database. It supports all databases from tnsnames.ora file (as a byproduct, a script which translates tnsnames.ora into context.xml usable in JSP services has been written). Currently, the service just show the 1-D histograms, other functionality (2-dim histograms, fits, …) could be easily added if needed because there is a complete analyses package running behind the service.

The accumulated testing examples are available from http://cern.ch/SQLTuple/HistogramTest.html. The service API is:


http://cern.ch/SQLTuple/Histogram.jsp
?database=<database name as declared in tnsnames.ora>
&select=<SQL select clause as "column_name as histogram_name">
&from=<comma separated table names as "schema1.table1,schema2.table2>
&where=<SQL where clause>
&limit=<maximum number of rows to analyse, "0" indicates no limit, the default is "1000">

The time for creation of one histogram from one table is roughly linear with the number of events (which could be controlled by limit parameter):


t = 3s + 1s * (nevents / 100000) + 1s * (nevents / 25000)

This number should be multiplied if more histograms are requested and more tables are queried.
Histograms

PHP/JavaScript solutions have been abandoned for two reasons:

  1. All PHP/JavaScript do very nice drawing. But they don’t understand what is a histogram. That means, that we would have to implement all functionality about binning, selecting limits, text legends,…
  2. There is a serious memory limit on the PHP server so that we could only accumulate histograms up to about 1000 entries there. (There is no limit on the new service).

How Extraction works

Extract
  • All important actions pass via Athenaeum Web Server, which is then able to act as a manager. This way, Athenaeum Extract Server doesn’t call directly its GetFile method, but calls GetFile.jsp on the Web Server and Web Server then call GetFile on the Extract Server (and sends a notification mail).
  • A user (directly or via ELSSI) can call GetFile.jsp. GetFile.jsp will loop until the extraction job finishes. It will then return the job summary (success or failure) – the same that is sends in the notification email.
other Athenaeum posts

Extraction Architecture

Extract deployment
  • Each worker node can host several cloned Extract servers running on different ports. Each server can serve multiple clients in different threads, the only limitation is CPU and memory. Extracted files are copied to appropriate (AFS) directories as requested by ELSSI (using xrootd server is under consideration). Worker nodes are currently running on lxvm0341 and voatlas18.
  • Extract manager manages Extract servers on worker nodes. It can start new server clones, monitor execution, etc. It dispatches extraction requests to concrete Extract servers. Extract manager communicates with Extract servers using Python scripts and XML fragments over XML-RPC connection. Extract manager can be accessed via a command line (athenaeum command) or using GET or POST command over HTTP connection (when it is deployed on suitable container – J2EE/Tomcat, etc.). Extract manager is currently deployed on CERN J2EE Server.
  • ELSSI accesses extraction service via Extraction manager. It can also access an Extract server directly, but that would by-pass management layer and could have problems to pass through firewalls.
  • User accesses extraction service via ELSSI service. She can access it also directly using a command line interface or Extract manager URL (for testing purposes, etc.).
other Athenaeum posts

Three Tier Tag Extraction with Athenaeum

The main aim is to separate the extraction execution and the management of the extraction servers. The other aim is to facilitate access to the extraction service.

Two involved services are:

  1. A set of  XML-RPC servers (currently running on lxvm0341 and voatlas18 at CERN). The role of those servers is to work, to execute extraction (or other tasks) using standard Atlas Python environment.
  2. A J2EE/Tomcat server running on centrally managed service in CERN. The role of that server is to be a unique front-end to worker servers. It knows about all running XML-RPC servers (on different ports, machines or even sites), it knows their characteristics. It can start new server as needed. At the same time, this server checks pre-conditions and post-conditions of the extraction tasks, accumulates statistics and handles errors.
Three Tier Extract
  • Athenaeum server can be hosted on the Tomcat/J2EE container (accessed over http) or can exist as a standalone executable. All access to Extraction servers goes via that Athenaeum server. Command line access has two modes: over http or via direct call to Athenaeum executable. Both J2EE server and command line run the same Athenaeum code (they contain the same Athenaeum.jar library+executable).
  • The Athenaeum server processes the input, creates a simple Python script (Extract.py) which then sends to one of XML-RPC servers for execution. That script calls the extraction service on the XML-RPC Server (tagExtrcat.py). In other words: the Athenaeum server creates an extraction job to be executed on an XML-RPC extraction server.
  • Athenaeum server can be deployed on any Tomcat container. It is distributed as a single file Athenaeum.war.
  • Athenaeum server containes other auxiliary methods, like GetFile.jsp.
  • Clients communicate only via http, which makes the configuration simpler and the service less vulnerable (there is no need to open additional holes in firewalls).
Athenaeum Architecture

other Athenaeum posts, ATLAS TAG User Services Overview

Family of Athenaeum Servers

New version of Athenaeum supports multiserver configuration. In that configuration, a family of servers (called brothers) run in parallel. All servers are equal and offer exactly the same functionality. User can start a new server using Fork command. List of running parallel server can be  requested using Family command (this commands also consolidates the family of servers by removing dead brothers and updating databases). Both new methods are available from the command line, all supported APIs (Java, Python, PHP, C++) and via Athenaeum Web Service.
Family of brother servers
other Athenaeum posts

Asynchronous Operations on the Extraction Server

Asynchronous Operations
other Athenaeum posts