Home

Basics
Overview
Menus
Toolbar
Tree
Code Editor
Search Frame
Change History

Intermediate
Add/Delete
Keywords
Search Lists
Date/Time
Shortcut Keys
Custom Icons
HTML Export
Hyperlinks
Drag & Drop
Bookmarks
Macros
Printing
Fonts
Hex View
Easter Eggs

Advanced
Revision Control
Object Storage
Language Def
Remote Folders
Server Sync
Online Search
Import Code
gbFlashLib

Forums
Announcements
Questions
Suggestions

Personal
Webcam
Biography
Contact Me

GBIC >> gbWare >> gbCodeLib >> Languages
gbCodeLib - Language Definition Files

The gbCodeLib source code editor can selectively apply formatting (such as colors and fonts) to common features of language source code as a visual aide to help users read, edit, and understand the source code.

For example, the source code editor can be instructed to display all language keywords in one color while all comments are colored a separate color. This is sometimes referred to as applying 'color syntax' to the source code but gbCodeLib supports many more formatting options. The gbCodeLib source code editor recognizes over 30 common language features, each of which may be assigned its own formatting rules.

The display instructions are kept in text language definition files, using XML formatting. gbCodeLib ships with language definition files for about 20 different languages. Users can modify these files to meet their own display preferences and can create language definition files for other languages.

However, defining display properties for 30+ features of a language can be somewhat tedious. If you plan to do a lot of language file editing, you should consider using the freeware program gbXML, which provides a graphical interface for creating language definition files. Here's an image of gbXML.


(click to enlarge)

The gbXML Help pages provide details of the language defintion files, but for those of you interested in the short version, here are the basics:

XML Language File Format
The XML language files used by gbCodeLib consist of the following hierarchy of elements: language > tokenset > validscope|tokens|tokens2. An example of the basic XML language file element structure is given in the following example:

<language>
    <tokenset>
        <validscope ... />
        <tokens>  ... </tokens>
        <tokens2> ... </tokens2>
    </tokenset>
</language>

A complete language definition file might typically consist of 5-10 tokenset elements, each with 2-4 validscope elements, 1 tokens element, and 1 tokens2 element.

Example
Here's an example of a simple XML language definition file which supports the keywords if, else, and while. Both {} and [] character pairs are used to define the blocks of code and can be nested:

<language name="TestLanguage" casesensitive="yes">
	<tokenset name="Scope Keywords" type="yes" 
                                 autoindent="yes" forecolor="red">
		<validscope name="Scope Keywords"/>
		<validscope name=""/>
		<tokens>
			<token>{</token>
			<token>[</token>
		</tokens>
		<tokens2>
			<token>}</token>
			<token>]</token>
		</tokens2>
	</tokenset>
	<tokenset name="Keywords" forecolor="blue" fontstyle="bold">
		<validscope name="Scope Keywords"/>
		<validscope name=""/>
		<tokens>
			<token>if</token>
			<token>else</token>
			<token>while</token>
		</tokens>
	</tokenset>
	<tokenset name="Text">
		<validscope name="Scope Keywords"/>
		<validscope name=""/>
	</tokenset>
</language>

Additional details on each element type are provided below.

Concepts - Tokens and Tokenset
The text which makes up language source code can be thought of as consisting entirely of 'tokens' - groups of text characters, such as keywords, variables, operators, and other symbols specific to the language. Tokens are often thought of as 'words', but a token can also be multiple words, such as the text string 'End Function', which is used to terminate functions in some languages. The purpose of a language definition file is to identify groups, or lists, of tokens and to specify the formatting options to applied to those tokens.

The list of tokens is placed within a tokenset element, whose attributes contain the formatting instructions. A tokens sub-element contains the actual list of tokens. Typically, the list is simply a multi-line listing of every token to which the formatting will be applied.

However, a list of tokens may also be defined through the use of regular expressions. This tutorial does not provide background on regular expressions but there are many online tutorials you may want to read if you're not already familiar with the concept. Using a single regular expression to define an entire list of tokens is a powerful simplifying tool for creating language definition files.

Tokensets Types
There are actually two types of tokensets - list and scope. As the name implies, a list tokenset is simply a listing of all source code text word/character strings (tokens) which below to the tokenset. As noted above, a regular expression may also be used to define the list of tokens.

Here's a simple example of a list tokenset with only a few tokens (attributes of each XML element with be discussed later):

<language name="java">
    <tokenset name="Common Words" id="keywords" type="list" forecolor="red">
        <tokens>
            <token>if</token>
            <token>while</token>
            <token>end</token>
        </tokens>
    </tokenset>

In this example, three tokens (if, while, end) are defined and will be displayed in the color red.

The second kind of tokenset, a scope tokenset, is a list of one or more token pairs. Each pair consists of a starting and ending token, where specific display characteristics are applied to the source code between the two tokens (as well as to the tokens themselves). For example, in most languages double-quotes are used to enclose strings. A scope tokenset which defines a pair of double-quote tokens would be used to apply formatting to all source code between the two double-quote characters.

A scope tokenset can include multiple token pairs. The first token of a pair is placed in a tokens element. The second, corresponding token of a pair is placed in a tokens2 element. Both tokens and tokens2 elements can contain any number of matching tokens but must contain the same number of tokens, corresponding to pairs of tokens.

Here's an example of a scope tokenset.

<language name="java">
    <tokenset name="String Tokens" id="strings" type="scope" forecolor="blue">
        <tokens>
            <token>"</token>
            <token>'</token>
        </tokens>
        <tokens2>
            <token>"</token>
            <token>'</token>
        </tokens2>
    </tokenset>

In this example, two pairs of tokens are defined - a pair of double-quotes and a pair of single quotes - both of which are used to enclose strings in some languages. Source code between either token pair would be colored blue in this example.

gbCodeLib language definition files also support single-token scope definitions - where a single token is used to define the start of a scope and the end of the line of text defines the end of the scope. In such cases, on the tokens element is needed - no tokens2 element is required.

For example, a single quote is used in Visual Basic to represent comments. The end of the line defines the end of the comment scope.

Validscopes
Sometimes, language elements may be embedded within one another. For example, a comment string may have a hypertext link embedded within it. The language definition files can be written to recognize such occurrences, applying a color syntax to the embedded elements that is different than the formatting applied to the enclosing element.

The validscope element is used to document tokenset which are valid (recognized) within other elements. A tokenset may contain any number of validscope elements, but only 2-4 validscopes are usually required to describe most languages.

Here's an XML example showing how to indicate that a hyperlink should be valid within a string scope tokenset. In this case, note that the hyperlink token is defined as a regular expression.

<language name="java">
    <tokenset name="String Tokens" id="strings" type="scope" forecolor="blue">
        <tokens>
            <token>"</token>
        </tokens>
        <tokens2>
            <token>"</token>
        </tokens2>
    </tokenset>
    <tokenset name="Active Links" id="hyperlinks" type="scope" forecolor="red">
        <validscope name="String Tokens" />
        <tokens regexp="yes" >
            <token> https?://([\.~:?#=\w]+\w+)(/[\.~:?#=\w]+\w)* </token>
        </tokens>
    </tokenset>

In this example, the hyperlink would be displayed as red text within a blue text string.