Archive for June, 2009
English, Joshua S. (2003)
This document describes the English Stemming Algorithm, which is intended for reducing the morphological variation of a given stream of tokens through conflation; commonly called stemming. Stemming is used in information retrieval (IR) applications to enhance performance.
The English Stemming Algorithm (ESA) uses a list of rules to reduce any given token, such as a word, to a common base. The rules are applied iteratively by matching a suffix string against token endings. The rule is flagged so that it may or may not apply to the token depending on if the token has already been modified by a previous rule. Each rule specifies removing, replacing, or adding characters to the end of the token.
The ordering of the rules is significant because a rule may be designed to act upon changes made by a previous rule. Each rule may also be of arbitrary length and consists of one or more instances of the four ESA rule components. The list of rules is processed iteratively until each has had a chance to apply itself to the token. When considering the flag of the second or greater instance in a rule, the flag only applies to changes made by the rule.
The ESA imposes the following boundary conditions. The input must be greater than two characters in length and must contain at least one vowel and one consonant. The output must be greater than two characters in length.
Rules consist of the following four parts:
- A suffix string (token ending) of 1 or more characters.
- A flag (y, Y, n, or N) indicating if the rule can be applied to the token. If the flag is ‘y’ or ‘Y’ and the token has already been modified by another rule or this same rule (for rules with multiple instances) the rule cannot be applied to the token.
- The number of characters to remove.
- The string to append to the token consisting of 0 or more characters.
In the reference implementation, rules were formatted with a comma character as a delimiter. For example:
The above rule looks for tokens ending in ‘ing’, this rule is applied regardless of token modification, three characters are removed, and no characters are appended. The rule format may be repeated to create more complex rules. For example:
Note that the rule format is the same but is simply extended using the defined ESA format. It was noted that future implementations might use a different character to delineate the rule sections for ease of modification and readability.
The ESA was implemented and tested using a simple rule list containing 67 items that were derived from the rule set used by the Porter Stemming Algorithm (Porter, M.F., 1980).
It was noted during implementation that organizing the rules in a tree structure would lend itself to the highest performance possible. This was due to the fact that a simple tree search could be performed to determine the rule(s) to be used for any given token.
However, for the sake of simplicity the rules were simply organized into an array where they could be tested iteratively against a stream of tokens. It was found that even use of this low-performance design still resulted in quick, powerful stemming.
The ESA test performance time was 0.000058 seconds per word on a Pentium Celeron 500 MHz system. The reference implementation has currently only been compared to the Porter Stemming Algorithm which has a performance time of 0.000072 seconds per word on the same system.
The ESA is comparable to the Paice/Husk or Lancaster Stemming Algorithm (Paice, C. Husk, G. 1980) in that it uses a series of rules. However ESA is able to use a shorter (and therefore faster) rule set than Paice/Husk due to its ability to have extended and interdependent rules.
The ESA is also comparable to the Porter Stemming Algorithm in that it uses a similar rule set to accomplish stemming. The ESA is simpler and faster because it uses a single step to process tokens unlike the multi-step process that Porter used.
ESA is superior to both the Lovins (Lovins, J.B. 1968) and Dawson (Dawson, J.L. 1974) Stemming Algorithms in the same fashion that Paice/Husk is superior to them. It is not hampered by post processing (as in Lovins) and ‘partial-matching’ (as found in Dawson). It is also not restricted to a set list of suffixes as found in both of these algorithms.
The ESA is also much faster than such inflectional linguistic morphology algorithms as Krovetz (Krovetz, B. 1993). It is also much more reliable due to the fact that the English language does not lend itself inflectional linguistic morphology because it is a weak morphological language.
Download reference-implementation source here: download
In a previous blog, I discussed how the unified lighting and shading where applied algorithmically. Here are two screenshots of the same object in the same scene, with soft shadows, where one has the lighting engine calculation, and the other is using default OpenGL lighting from the same point-source.
The following are some screenshots from the Xaede game engine’s unit-tests.
Radial Blur Test
Physics with Shadow Projection Test
Vehicle Physics Test
When reading about next-generation game engine development, I came across a game development project entitled Project Offset. This game engine features a concept in three-dimensional (3D) computer graphical rendering where all of the objects in a scene are both lighted and shadowed in a single pass. This is referred too as a unified lighting & shading engine.
I then proceeded to explore this concept by prototyping a unified lighting and shading engine using OpenGL high-level shading language GLSL in the Xaede Game Engine’s shader system. The advantage to using GLSL is that the resulting program is compiled and run directly on the video card’s graphical processing unit (GPU) and resides within the video card’s memory.
The goal for this prototype was to produce a rendering pipeline that could calculate both lighting and shadowing in a single mathematical calculation while providing correct object shadows, self-shadowing, and soft-shadows – thus accurately mimicking the next-generation technology exhibited by the Project Offset engine.
I began by creating a vertex shader – this is a GLSL computer program that is called whenever a new vertex is rendered to the scene – to determine where the vertex lies within the projection and model-view matrices, to retain the corresponding pixel coordinates for the vertex, to preserve the current color of the pixels associated with the vertex, and to calculate a normal vector from the vertex against the angle of the lighting source.
The information calculated and retained by the vertex shader is then passed to the GLSL pixel shader. The pixel shader is the computer program applied to each pixel rendered in the scene in order to create the 2D frame composite of the 3D objects rendered in the scene in order to present the scene to the computer user. The pixel shader begins by calculating the lighting value for the pixel, then by calculating the shading value for the pixel. The lighting algorithm works in the following fashion:
Note that the following values are represented as red-green-blue-alpha (RGBA) and therefore mathematically calculable as x-y-z-w vectors.
The emissive light value is not calculated, but rather set as a constant value.
v(eL) = eL
The ambient lighting value is calculated by obtaining the ambient light amount multiplied by the ambient light’s color.
v(aL) = (aL * lC)
The diffuse lighting value is calculated by first, obtaining the normalized result of the light location subtracted by the vertex location (obtained from the vertex shader). The result of this is then used in a dot product against the vertex normal, and is referred too as the diffuse “term”.
lV = normalized(lV – vL)
dT(dL) = (vN dot lV)
If the diffuse term exceeds a value of 0.0 (i.e. it has a real-number value greater than zero) the local diffuse lighting value is calculated as a value clamped between 0.0 and 1.0 resulting from the multiplication of the diffuse light intensity against the light color against the diffuse term. This also allows the specular lighting value to be calculated.
v(dL) = (dL * lC * dT)
The specular lighting value is only calculated when the diffuse term exceeds a value of 0.0. The specular lighting value is calculated by first, obtaining a direction vector resulting from the normalization of the camera location subtracted by the vertex location (obtained from the vertex shader). A specular vector is then calculated by obtaining the normalized result of the light’s vector (obtained from the diffuse lighting calculation) added to the direction vector. This specular vector is used to calculate the specular “term” by obtaining the result of the vertex normal (obtained from the vertex shader) dot product against the specular vector raised to the power of the light shininess value.
dV = normalized(cL – vL)
sV = normalized(lV + dV)
sT(sL) = ((vN dot sV) ^ shininess)
Then the specular lighting value is finally calculated by obtaining the result (clamped between 0.0 and 1.0) of the specular light intensity multiplied by the light’s color multiplied by the specular term.
v(sL) = (sL * lC * sT)
The final lighting calculation is the obtained by adding the emissive value added to the ambient value added to the diffuse value added to the specular value, the result of which is multiplied by the original pixel color. This calculation results in either brightening or darkening a given pixel based upon the result of the lighting calculation.
light = ((v(eL) + v(aL) + v(dL) + v(sL)) * pC)
The set of coordinates defining the objects’ shadow is calculated by obtaining the projection coordinates (obtained from the vertex shader) divided by the coordinates’ quotient. If the shadow coordinates fall within the visible spectrum, the shadow color is calculated by applying the GLSL 2D shadow calculation against the shadow map (a map obtained by first rendering the scene in a shadow-map mode where the scene is rendered as a gray-scale depth-map from the camera position of the light source, then retaining the rendered frame from this calculation) generated for this scene.
shadow = (sM glsl-shadow-2D sC)
The result of this calculation allow for correct exact-shadows and self-shadows. The shadow calculation is then modified to create the soft-shadow effect.
The resulting shadow color is then modified by the application of a percentage closer filtering (PCF) algorithm to create the first-pass of the soft-shadowing system. This is applied by sampling the pixels surrounding the shadowed pixel to modify the shadow color value, thus blurring the outline of the shadow. The resulting blur however is pixelated by the volume of the PCF pixel-sampling calculations. I found optimal sampling results in a 1024×768 frame with a ratio of 0.001953125 (i.e. 1.0 / 512.0) performing 8 sampling operations (one block for each surrounding pixel sampling volume).
The resulting pixelization is then softened by two Gaussian-blur calculations. The first is a vertical-pass over the PCF-sampled pixel volumes followed by a horizontal pass of the PCF-sampled pixel volumes. The Gaussian-blur algorithm was applied by first calculating the versicle and horizontal pixel offsets for the PCF pixel volumes, and then applying them to the pixels contained within those volumes by adding the color of the pixel modified by the color of the offset pixel to the existing shadow pixel color.
Once the lighting and shading calculations have completed, the resulting pixel color of the lighting color (already applied to the existing pixel color) multiplied by the shadow pixel color (if a shadow exists for this pixel, otherwise the lighting calculation alone is applied).
pixel = (light * shadow)
The goal behind the initial Xaede game engine was to take the lessons learned from the stage one and two prototypes to develop a ground-up first-person-shooter (FPS) game in C using the OpenGL application programming interface (API) with the simple direct-media layer (SDL) for windowing system interface (allows the same code to run natively on Windows, Linux and OS X).
Milestone: January 28, 2006
Began Xaede game engine implementation by creating a platform-independent type library, threading library, time and timing library, and socket handling library.
Milestone: February 03, 2006
Created the platform-independent file handling library, created first-in first-out (FIFO) stack and hash map libraries.
Milestone: February 12, 2006
Debugged the socket handling and file handling libraries. Created the texture library, loadable from BMP and TARGA images. Created the particle system, including an engine for particle processing with callback function algorithmic direction determination. Created a wrapper for the SDL true-type fonts (TTF) library, optimized against the OpenGL pipeline.
Particle Engine: Explosion
Particle Engine: Flame
Particle Engine: Fire
Milestone: February 17, 2006
Enabled WAV sounds using SDL. Created divers for mouse and keyboard input. Solidified the particle engine system. Began creating particle engine effects – a flame effect and an explosion effect. Added OpenGL fog effects. Created a wrapper for the OpenGL lighting system.
Milestone: February 18, 2006
Created a static mesh system with internally-usable API. Added multi-texturing to the texture API. Added screen-shot rendering and recording to the rendering pipeline API. Added user-definable blending to the multi-texturing API.
Milestone: March 08, 2006
Created an algorithmic implementation of bump-mapping. Added an off-screen rendering buffer to act as a software-based frame-buffer. Created the first camera definition (an FPS camera using quaternions). Created a system to allow user-definable callback functions in the camera API. Added a system for stacking events for event-driven camera movement (e.g. follow-camera mode, etc.). Created an algorithmic implementation of parallax-mapping, finding that it is woefully inefficient compared to a shader-based implementation.
Static Mesh Render
Static Mesh Render
Combined Scene Render
Milestone: March 29, 2006
Created a texture-matrix manipulation API. Created a static 3D object to indicate camera position within the scene. Finalized the camera system with keyboard and mouse input for user-driven camera movment. Created a plugin to load Wavefront objects into the static mesh model for rendering and manipulation within the scene. Rewrote the polygon system (the supporting API layer under the static mesh) to contain any conceivable polygon type. Added an API function to the rendering pipeline for switching back and for the between 2D and 3D rendering modes. Added a render function to the mouse drive to indicate mouse screen position in 2D mode. Added a heads-up display (HUD) render as a cross-hair texture mapped against a single-sided polygon. Created the radial blur effect. Enhanced the off-screen rendering system. Added camera effects system (e.g. render the scene from any angle and location, then blit the render to the current scene render) to support gun- and helmet-cameras.
Milestone: April 02, 2006
Performed further research into the SDL TTF system, found that fonts can be bound to OpenGL textures, however the frame-rate of the rendered output was the same. Enhanced the Wavefront object API to handle OpenGL materials, added helper functions to load a material (.mtl) file if a corresponding one exists for an object (.obj) file. Corrected gimbal-lock in the camera system and invalid rotations in the camera system.
Combined Scene Render
Tree Static Mesh
Radial Blur Effect
Milestone: April 05, 2006
Re-created the HUD rendering system as a GUI widget. Move the static polygon indicating positing within a 3D scene to the GUI package. Created a billboarding system to correct the angle of the particles within the particle engine to face towards the current camera.
Milestone: May 12, 2006
Worked to determine if inherent culling in OpenGL from the visible frustum is sufficient for optimal frame-rates or if a more sophisticated system should be implemented. Determined that it is not a very efficient system, and determined that one does not want to make calls to OpenGL when one does not have too (i.e. coordinates that are outside the visible frustum). Made these determinations by loading large static meshes (Wavefront object files of 3rd-party game levels). Created a geometric-plane-based frustum culling system. Enhanced the static mesh API to use frustum culling, OpenGL display lists, and inverted rendering. Enhanced the Wavefront object plugin to handle heretofore unknown object formats and to gracefully handle missing and erroneous data.
Static Mesh Render
Physics System Render
Physics System Render
- Language: C++ (lines of code: 25,057)
- External libraries used: OpenGL, SDL, SDL-TTF, ODE
The second stage prototype essentially comprised of an effort to take the lessons-learned from the first stage of development, and create a prototype game engine using the C++ programming language. This language was chosen in order to present a cohesive, object-oriented API to a potential future user of the game engine. This would allow them to implement their game engine, including such things as graphics, gameplay management, and artificial intelligence using a standard OO paradigm. However, what resulted from the stage two development was a nice framework for developing various game engine components, but lacking cohesive design elements that made the various independent elements a solid platform.
Since the use of GLUT was found to be severely limiting to the implementation of a gaming engine, it was decided that the Simple Direct-media Layer or SDL will be used in order to interface with windowing systems, and that the use of OpenGL will be retained, in order for the lessons learned in the stage one development process to be used in the second stage.
The second stage began with the implementation of classes to handle the management of frame rate, setting up the engine’s rendering environment, and managing the rendering process. This was quickly expanded, using code from the first stage as a model, to create classes for color management and dynamic polygon model management. These initial pieces were then used to render some of the simple polygon objects used in stage one. Additional components such as a lighting class and a timing class were added based upon the stage one components.
In order to be able to perform simple keyboard input, the SDL API was used to trap key events such as the escape key in order to make simple manipulations of the game engine environment while it was running.
The next step consisted of determine how texture mapping worked, both using OpenGL and the SDL API for loading and rendering bitmaps as they applied to object models. The texture class was created to load and process a bitmap file into a GL texture map.
In order to proceed to more complex polygon models, a new class was created to define what comprises a model within the Xaede game engine. Then, two model-importing classes were created that can natively read either 3DS or Milkshape model formats, and convert these to the Xaede model format for rendering and manipulation by the game engine.
Since the more complex models clearly illustrated a need for reading, initializing, and rendering textures, the texture class was enhanced to fulfill this need. Once the texture was registered with OpenGL, it could be reloaded & reinitialized from memory as needed, so the texture class acted as a crude caching mechanism for textures as well.
In order to determine when two models collided with each other, a simple class to calculate collisions using a new three-dimensional vector math class was implemented to handle this. This also served as an excellent framework for experimenting with collision modeling and the various optimizations used to enhance collision-detection performance.
When two models collided, it was determined that it would be nice to see some sort of effect – like an explosion – to be shown when this event occurred. Several different particle engines where then written, before consolidating on a particle engine type from which generic effects could be derived, e.g. explosions, fire, etc.
At this point, further consideration was given to user input, including both mouse and keyboard input. The result of this was to create a keyboard class to manage the keyboard input state, as well as a mouse class to handle mouse input events. These were designed to be used by the game engine or it sub-components to obtain user input. However, lessons learned during implantation suggest that there were better ways to create both the API as well as some of the internal components to the user-input management classes.
Once a working methodology for obtaining user input was completed, it was noted that there was no simple way to display text that would work well on multiple platforms. Upon research into this issue, it was determined to use the TTF (true type fonts) extension to the SDL library to render text as a TTF on top of OpenGL. This turned out to not be quite as simple as first thought, but it was eventually discovered that the OpenGL initialization needed to happen in such a fashion so that SDL surfaces could be blitted on top of the OpenGL screen. The font class was then created to handle this requirement. From the font class, the text class was derived to control the text displayed to the screen.
Given the current complex models, it was decided that the skeleton model manipulation system should be reimplemented in the stage two prototype using the lessons learned from the first stage to provide a consistent API for manipulation three-dimensional polygon objects.
Language: C++ (lines of code: 5,585)
External libraries used: OpenGL, SDL
- Base (framerate, GL lights, rendering pipeline, setup, timing)
- Format (3DS, milkshape)
- Graphics (colors, polygons, skeletal system, textures)
- Input (keyboard, mouse)
- Physics (vector 3D math)
In order to explore the OpenGL API and become more familiar with graphics library conventions and capabilities, the first prototype was implemented in the C programming language using the OpenGL Utility API (GLUT) for access to windowing-system calls. This initial GLUT prototype explored the creation and usage of triangle-, square- or “quad”-, and vertex-list-based polygon objects and primitive models.
Essentially, it provided a platform for implementing different lighting models, simple texture application, and a primitive skeletal-based animation system. Also, managing user input was explored, and a simple console-based user-interface was implemented. Use of the keyboard to initiate object manipulation on the X, Y, and Z coordinate planes as well as the three-aspect rotation was also implemented.
Lessons learned from this prototype implementation include learning that GLUT has many limitations that essentially render it useless for implementing a game-engine. Setting up multiple lights in OpenGL is easy, however always anticipating all of the resulting effects is not. Using different rendering techniques, such as blending and alpha-filtering need to be further explored. Using polygon object models based upon complex polygons – e.g. vertex listing – is much slower to render with current hardware and software in current graphics libraries than triangle- or quad- based polygon models. Implementing a skeletal animation system is fairly easy, creating a good API for using that system within the game engine source will be a non-trivial exercise.
Language: C (lines of code: 1,986)
External libraries used: OpenGL, GLU, GLUT
- frame rate
- graphics engine prototype
- black and white tiled floor static render
- user-interface (keyboard driver, console system)
- models (cube model, sphere model, generic polygon model
- skeletal animation system
This game is a turn-based strategy title where a player manages their planets and distributed galactic resources to upgrade technology, build fleets and execute space battles in order to conquer the galaxy. The game concept is loosely based upon the old Linux/KDE game “Konquest”, which I liked so much as a single-player strategy game (which is impossible in Konquest unless you play against yourself) that I created this title with built-in A.I. to play against. The A.I. system is based upon an old genetic-algorithm project I did which was designed to find the most efficient strategy for playing the game. Play