Chapter 9 More Details on IDL

The basic concepts of IDL were discussed in Section 1.4. This chapter provides details on some of the more obscure or recent additions to IDL, and also discusses how to work around CORBA’s lack of a versioning mechanism.

9.1 Pseudo-IDL, `local` and `native` types

In general, it is never possible to completely define a system in terms of itself, and CORBA is no exception. In particular, the OMG naturally decided to use IDL to define most of the APIs of CORBA, but there were some APIs that were impossible to express in legal IDL. For example, all IDL interfaces implicitly inherit from the base type Object. It is not possible to express the API of the Object interface in syntactically legal IDL because Object is a reserved keyword rather than an identifier. To work around this problem, the OMG used an informal notation called pseudo-IDL (PIDL) to define APIs of built-in types, such as Object. Pseudo-IDL is written as closely as possible to real IDL but a comment of the form "// PIDL" indicates that the API is not syntactically-valid IDL and hence cannot be run through an IDL compiler. PIDL was used extensively in early versions of the CORBA specification.

As CORBA matured, two new keywords—local and native—were introduced to IDL that made it possible to define a greater range of CORBA APIs in IDL. The introduction of these keywords reduced (but did not entirely eliminate) the need for pseudo-IDL.

The local keyword can appear in front of an interface definition. The effect is to define an interface that can be accessed only locally, that is, only within the same process. This keyword is not normally used by application-level developers. Rather, the intention of this keyword is to allow many local-access-only APIs of CORBA to be defined in IDL. For example, DynAny (Section 15.3), Current (Chapter 13), portable interceptors (Chapter 14), Policy (Section 16.1), the ORB itself and many of the types used for implementing server applications—POA, POAManager, ServantManager, Policy, and so on (Chapter 5)—are defined as local interface types.

The native keyword is used to indicate that a type is not an IDL type but rather is implemented in the host language, that is, C++/Java/Cobol or whatever programming language is used by developers to implement CORBA applications. A native type can be passed as a parameter only to local interfaces. The purpose of native declarations is to allow parts of CORBA to interact with the host language. For example, CORBA uses the terminology servant (Section 5.2) to refer to the host language object that represents a CORBA object; there is a corresponding declaration:


native Servant;

The POA infrastructure (Section 5.5) defines several local interfaces with operations that take Servant parameters.

9.2 Objects By Value (OBV)

CORBA became popular a few years before Java/J2EE became popular. When J2EE was announced, it was recognized that in some ways CORBA and J2EE complemented each other but that in other ways they were competitors. There was a lot of speculation about whether one of these apparently-competing technologies would “beat” the other. There was one particular capability present in Java that was missing from CORBA and some people within the OMG felt that CORBA should be enhanced to provide a similar capability. This feature was to become known as objects by value (OBV). The driving force behind OBV was not good technical innovation but rather was political and marketing pressure to defend CORBA from the perceived threat of J2EE. Quite predictably, the resulting OBV specification was (and remains) somewhat controversial because it has some technical rough edges and provides capabilities that can be misused easily.

9.2.1 The Java Equivalent of OBV

Before discussing what OBV is from a technical perspective, it is useful to discuss the Java-based technologies that it tries to emulate. Java has built-in support for serializing an object, that is, converting the in-memory representation of an object into a binary buffer and then later converting from the binary buffer back into an in-memory representation. This serialization capability of Java provides a convenient way to persist Java objects, by storing the binary buffer representation in, say, a file or database. It also makes it possible to serialize a Java object into a binary buffer, transmit this buffer across a socket connection to another Java process, and for the receiving process to re-create the Java object in its own address space. In effect, a Java object can be transmitted “by value” from one Java process to another Java process. Actually, the mechanism discussed so far serializes and transmits only the state (instance variables or fields) of the Java object. An object is both state and the operations that manipulate that state. However, Java is also capable of transmitting the bytecode that implements the operations of an object. In this way, Java is able to transmit both the state and operations of an object. Transmitting the bytecode of an object is important because the receiving Java process might not have local access to the relevant bytecode. For example, the receiving Java process might be expecting an object of type Graphic but might actually receive a subtype of Graphic called Circle for which it does not have access to the relevant bytecode.

An obvious question about this Java capability is: Is this really useful? A typical usage for this is the following interaction between a client application and server application:

The client invokes an operation on the server. The return value of the operation is an object (state and, if required, bytecode).
The client invokes many fine-grained operations upon its local (copy of the) object.
When the client has finished making its updates to the local object, it then makes a remote call to the server application, and passes the (updated) object as a parameter.

The main benefits offered by this usage scenario are as follows:

Passing objects (by value) between processes can provide a significant optimization. In step 2 above, having the client make fine-grained operation calls upon a local object is much faster than making similar calls upon a remote object. This is because a remote call typically involves a few milliseconds of network latency ; the local calls do not have this overhead.
The same optimization could be achieved by passing just data—for example, structs and sequences—between the client and server. However, this would expose the client to the low-level data directly. It is better for these low-level implementation details to be hidden within operations, particularly if the bytecode of these operations can be transmitted automatically from the server to the client.

9.2.2 Objects By Value in CORBA

A CORBA interface has operations but no state variables. In contrast to this, a CORBA struct has state variables (fields) but no operations. A new construct, called a valuetype, has been introduced to IDL. A valuetype looks like a cross between an interface and a struct because it has both operations and state variables. Some examples of valuetype declarations are shown in Figure 9.1.


valuetype Date {
  short   year;
  short   month;
  short   date;
  void    next_day();
  void    previous_day();
};
valuetype OptionalString string;

Figure 9.1: Example of IDL valuetype definitions

When a valuetype is passed as a parameter, its state variables are transmitted. Operations invoked upon a valuetype are always invoked on the local (copy of the) valuetype. In general, it is not feasible for the code that implements the bodies of operations to be transmitted, because the client application and server application may be implemented with different programming languages and/or on different CPU types. For this reason, the client and server application developers must write and maintain separate implementations of the valuetype’s operations. This requirement introduces a big problem: there is no guarantee that the server-side implementation of the valuetype operations is semantically equivalent to the client-side implementations of the same operations. When developing the first version of the client and server applications, developers on both sides will probably take great care to ensure that the client-side and server-side operations have equivalent semantics. However, during ongoing maintenance of the applications, it is quite possible that a change in semantics (perhaps in the form of a bug-fix or a buggy optimization) will be introduced into the server-side implementation of the operations, but that a similar change will not be made in the client-side operations. During the lifetime of a project, there may be one server implemented in, say, C++, and several different kinds of clients, each of which is implemented in different languages, such as Java, Ada and Cobol. Maintaining semantic equivalence of operations implemented in multiple programming languages and used in multiple applications can quickly become a significant burden.

Opponents of valuetype point out that distributed applications have been successfully developed and deployed for several decades without the use of valuetype (or something similar). Because of this, valuetype is not an essential feature of CORBA and can (and probably should) be ignored.

Having discussed one of the main drawbacks of valuetype, I now briefly list some of the extra capabilities that they provide.

valuetype Base {
  long    some_data;
};
valuetype Derived : Base {
  long    more_data;
};
Figure 9.2: Inheritance of valuetype definitions

First, if you declare a valuetype that contains state variables but no operations then it is semantically similar to a struct but has one additional benefit: you can have single inheritance of such valuetypes.¹ This is shown in Figure 9.2. In effect, you can think of a valuetype as being a struct with inheritance.

Second, a valuetype is always passed in a manner similar to a C++ pointer (a reference in Java). For example, if a field within a valuetype is another valuetype then this field is a pointer to the embedded valuetype. It is legal to use a null pointer where a valuetype is expected. By introducing pointer semantics to IDL, valuetypes allow you to model cyclic graph structures. Also, if you declare a valuetype that has just one field, say, a string, then this allows you to pass a “normal” string (embedded inside a valuetype) or a null pointer as a parameter. In effect, this is a convenient way to pass an “optional value” as a parameter.² The designers of OBV felt that the “optional value” usage of valuetype would be useful often enough that they invented some syntactic sugar for it. This syntactic sugar is illustrated by the OptionalString declaration in Figure 9.1. This syntactic sugar format is usually referred to as a valuebox.

9.3 Versioning

CORBA does not have a mechanism for versioning IDL definitions. Unfortunately, there is widespread confusion about this. The confusion arises because CORBA 1 defined a syntactic place-holder for a possible future versioning mechanism. The syntactic place-holder was called #pragma version and it was intended to be used in IDL files as shown in the example below:


#pragma version "1.2"

The "1.2" was intended to indicate a version number for the following IDL construct.

A versioning mechanism requires more than just a syntactic construct: it requires additional supporting infrastructure. However, the OMG has never defined the necessary supporting infrastructure to make #pragma version useful. Because of this, #pragma version “is a historical relic and is ignored by the ORB” [HV99, Section 4.19.3]. Unfortunately, the continued presence of this syntactic place-holder leads many people to incorrectly assume that CORBA has a versioning mechanism and they then waste time and effort trying to make use of it.

Given that CORBA does not have a built-in versioning mechanism, the question then arises of whether there is any way to fake a versioning mechanism. Two (imperfect) suggestions are discussed below.

One approach is to (mis)use inheritance as a versioning mechanism. For example, let us assume that you have an existing IDL interface called Account and you want to create a new version that has additional functionality. You can do this by defining a new interface called, say, Account2 that inherits from Account and adds new operations.³ This approach works if the new version of the interface only adds new functionality; it will not work if you need to delete or modify the signatures of existing operations. Also, this approach will result in a deep inheritance hierarchy if you use it to define several versions of an interface.

Figure 9.3: A copy-and-modify approach to versioning

Another approach to faking versioning is to define a new, unrelated interface. This is illustrated in Figure 9.3. The original IDL types for an application are defined in module Finance (shown in the box on the left). When a new version of the application is being developed, a copy is made of the IDL file and the module is renamed from Finance to Finance2.⁴ Then the types within Finance2 can be modified without restriction. As far as humans are concerned, Finance2::Account is “similar to” Finance::Account and so they can think of them as being different versions of the same interface. However, this “versioning” is entirely within the minds of humans. As far as CORBA is concerned, the two interfaces are semantically unrelated.

In general, it is good coding practice to define all types inside modules, as this reduces namespace pollution. The use of modules offers another benefit for versioning: it is much more convenient to embed the version number in the name of one module rather than embed the version number in the names of the, possibly numerous, data-types defined within the module. Also, when updating version 1 of the source-code of an application to produce version 2, a single global-search-and-replace within source-code files for the name of the module is easy to perform.

It should be noted that the lack of a built-in versioning mechanism is not unique to CORBA. Most middleware systems lack a versioning mechanism, as do most programming languages.

9.4 Repository IDs

A repository id is a slightly mangled form of the fully-scoped name of an entry in an IDL file. For example, the repository id of Finance::Account is "IDL:Finance/Account:1.0". In general, all occurrences of "::" in the fully-scoped name are replaced with "/". The resulting string is then prefixed with "IDL:" and suffixed with "1.0".⁵

IDL allows a #pragma prefix "..." construct to be used in IDL files. An example is shown below:


#pragma prefix "acme.com"
module Finance {
	    interface Account { ... };
};

If a #pragma prefix directive is used in an IDL file then the specified prefix ("acme.com" in the above example) is embedded into the repository ids for all types in that file. For example, the repository id for type Finance::Account is "IDL:acme.com/Finance/Account:1.0".

Repository ids are a form of runtime type information. Most CORBA applications rely on compile-time type checking so repository ids are not used very frequently. However, most CORBA developers do encounter repository ids occasionally, so it is useful to know what they are and what their intended usage is. Here is an incomplete list of when repository ids are used:

As mentioned in Section 1.5, an interoperable object reference (IOR) contains the “contact details” for an object. However, the IOR may also contain the repository id for the object’s type. ⁶ The presence of a repository id in an IOR is a very useful debugging aid. For example, most CORBA implementations provide a command-line utility that can print out the repository id and contact details contained inside an IOR (Section 3.4.2). It is common for people to use such utilities to help them diagnose problems when developing and deploying a client-server system. Often a problem is due to a client being given the wrong kind of IOR, for example, an IOR for an Employee object rather than an IOR for a Finance::Account object. Being able to see the repository id embedded in an IOR often helps people to diagnose these kinds of problems easily.
When an operation in a server application throws an exception, the exception’s repository id is marshaled (serialized) first, followed by the fields within the exception. In this way, the CORBA runtime system in the client application can use the repository id to determine the exception’s type; this then tells the CORBA runtime system how it should unmarshal (deserialize) the fields of the exception. A discussion about marshaling can be found in Section 11.2.
Repository ids are used in programs that utilize meta-information (Chapter 15).

I mentioned earlier that use of a #pragma prefix "..." directive causes the specified prefix to be embedded in the repository ids for all types in that IDL file. Use of a #pragma prefix directive does not affect the public API that is generated by an IDL compiler. For example, it does not affect the public API of the C++ or Java types generated by an IDL compiler. However, it does affect the implementation of the generated operations that return the repository ids of IDL types. This is because the string returned by these operations must embed the string used in a #pragma prefix directive.

Sometimes people wonder what purpose is served by placing #pragma prefix directives in IDL files. The answer can be illustrated with an example. Let us assume that the Bank of America defines a module called Finance that contains an Account interface. Without use of a #pragma prefix directive, the repository id of this type is "IDL:Finance/Account:1.0". The problem is that the Bank of America might not be the only organization in the world to define an interface called Finance::Account. If another organization defines an interface with the same name (and presumably with operations that have different signatures) then it might be difficult to diagnose problems if a client application that was written to communicate with a Bank of America server accidentally gets an IOR for a Finance::Account object in a different organization. To avoid such problems, developers are encouraged to put a #pragma prefix into all their IDL files. The prefix string should contain something that is unique to the developer’s organization. Typically, an Internet domain name is used, as this is a globally unique identifier. For example, IDL files written by developers in the Bank of America might contain the following:


#pragma prefix "bankofamerica.com"

Now, an IOR for the Finance::Account interface defined in such a file is:


IDL:bankofamerica.com/Finance/Account:1.0

If a client application is developed with the Bank of America IDL files then the CORBA runtime system in this client application will throw an exception if it is mistakenly given an IOR that contains an inappropriate repository id such as "IDL:Finance/Account:1.0" or "IDL:bankofengland.co.uk/Finance/Account:1.0".

9.5 Miscellaneous New Keywords

The following new keywords have been added to IDL in recent years.

The typeprefix keyword serves a purpose similar to the #pragma prefix construct discussed in Section 9.4. An example of its use is shown below:


module CosNaming {
	  typeprefix CosNaming "omg.org";
	  ...
};

In this example, the typeprefix command causes the "omg.org" prefix to be embedded in the repository ids for the CosNaming module and all types declared inside it.

The import keyword serves a similar purpose to a #include directive that was discussed in Section 1.4.1. An example of its use is shown below:


import CosNaming;

As shown in this example, import is typically followed by the name of a module. It has the effect of including the IDL file that contains that module.

An operation can have a raises clause, which means that it can raise user-defined exceptions. In contrast, for many years an attribute (which is, in essence, syntactic sugar for a pair of get- and set-style operations) could not have a raises clause, so it could not raise exceptions. The new keywords getraises and setraises have been added to IDL to specify what exceptions can be raised by the get- and set-style operations for which an attribute is syntactic sugar.


exception X { ... };
exception Y { ... };
exception Z { ... };
interface Foo {
	  attribute string name getraises(X, Y) setraises(Y, Z);
};

1: You can use multiple inheritance if valuetypes have operations but no state variables.
2: IDL provides two other ways to pass an “optional value”. One way is to use a sequence of length 1 to hold the value and a sequence of length 0 to indicate “no value”. The other way is to use a union. The union’s discriminant (case label) can indicate whether or not the intended value is provided.
3: The OMG used this approach with the Naming Service. The first version of the Naming Service defined an interface called NamingContext (in module CosNaming). “Version 2” of the Naming Service was defined in an interface called NamingContextExt that inherited from NamingContext and added some new operations.
4: The naming scheme would be more consistent if the original module had been called Finance1 rather than Finance. However, such foresight is rarely found in reality and so version numbers usually are not embedded in the name of the original module.
5: The "1.0" suffix denotes the version number. This version number was incorporated into the repository id to support the (later abandoned) versioning mechanism discussed in Section 9.3. A #pragma version directive could be used to change the version number embedded in a repository id, but there is no point in doing this because, as explained in Section 9.3, CORBA does not offer a proper versioning mechanism.
6: The CORBA specification states that an IOR is not obliged to contain a repository id; an IOR may contain an empty string instead. However, most CORBA implementations embed a repository id into IORs.