Shader Extension
by LSnK
Latest version: 1.4
Last updated: 22/Dec/2010

Currently does not work with GM 8.1 onward.

Changelog
1.4: Fixed d3d_vs_create(), it was ignoring constants set by the 'def' instruction.
1.4: Changed VS configs so they work like the others. No overwriting without explicit definition.

1.3: Added z-bias control. Prevents z-fighting between coplanar polygons; useful for shadows and decals.
1.3: Added extended primitive functions which can draw up to 8192 vertices at once with position, normal, colour, specular, eight textures and eight sets of texture coordinates.
1.3: Added vertex shader 1.1 support and related functions. Due to GM's nefarious machinations, their use is limited to the the extended primitive functions.
1.3: Added helper functions which directly set shader constants to GM-format colours.
1.3: Changed configs: PS/VS/tex configs are now independent.
1.3: Changed d3d_set_tex_all: now sets stage 0 too. The extended primitive functions use it.
1.3: Demo has been updated with a water post-processing effect. Press P.

1.2: Added d3d_dev_get_ps_version, checks pixel shader support.
1.2: Added d3d_set_tex_border, sets colour and alpha for tex_wrap_border mode.
1.2: d3d_set_tex* functions now accept -1 as the tex parameter. This unbinds the texture from the stage making it safe to delete the associated resource in GM.
1.2: Fixed the extension's fillmode_ constants. They were named incorrectly.
1.2: Fixed d3d_set_tex_all in the DLL version. It was defined with the wrong number of arguments.
1.2: Fixed d3d_set_fog_color setting the wrong colours. D3D's colour format is RGB, not BGR.

1.1: First release.

ABOUT
Pixels shaders are awesome: powerful, fast and compatible. Now you can use them directly in GM!

But that's not all. You also get:

    Alpha/Z testing
    Prevent pixels from being drawn using a customisable comparison. Alpha testing is often used in commercial games when drawing leaves and fences.
    Configurable exponential fog
    More realistic fog for outdoor scenes.
    Point primitive scaling and texturing
    Control the size of points and draw them with any texture applied.
    Write masking
    Control red, green, blue, alpha and depth writing individually.
    Multitexturing
    Pixel shaders allow you to draw up to six textures at once. Good for detail textures and more.
    Wireframe and point rendering
    Handy for testing. Point rendering obeys the point primitive settings.
    Z-bias
    Prevent z-fighting between geometry with the same coordinates. You could overlay wireframes on existing geometry without flickering, for example.
    Automatic normal-vector normalisation
    Fixes lighting problems when scaling models.
    Informational functions
    Get the max point and texture sizes, GPU name, shader support, free texture memory, etc. No more guessing.
    Extended primitives
    Draw primitives with up to 8192 vertices, each with position, normal, diffuse and specular colour/alpha and up to eight textures with eight sets of texture coordinates.
    Vertex shaders
    Manipulate extended primitives with per-vertex calculations, much like a pixel shader.

HOW TO USE
All functions work both in 2D and 3D mode where applicable.
Functions evaluate true on success, or false on failure. -1 specifically signifies an input error.

Initialisation

scr_shader_init()
DLL only. The extension does this for you automatically.

Information

d3d_dev_get_name()
GPU name.

d3d_dev_get_point_max_size()
Maximum point primitive size.

d3d_dev_get_ps_version()
Max supported pixel shader version. 10 - 14. If < 10, the GPU doesn't support pixel shaders.

d3d_dev_get_tex_max_width()
d3d_dev_get_tex_max_height()
Maximum texture size. Applies to all graphical resources.

d3d_dev_get_tex_max_stages()
Maximum simultaneous textures. Limits number of textures you can use at once. Usually 8 even on integrated GPUs.

d3d_dev_get_tex_mem()
Free texture memory in bytes. Approximate.

Pixel Shaders

d3d_ps_create( "assembly" )
Compile pixel shader from assembly source. Returns index for use in other shader functions.

d3d_ps_destroy( shader )
Free memory used by a shader, destroying it.

d3d_set_ps( shader )
Set current pixel shader or -1 for none. Shaders work like blend modes; one at a time, on or off. Shading is done before fog drawing and alpha testing.

d3d_set_ps_const( constant, r, g, b, a )
Set pixel shader constant register. There are 8 registers indexed 0-7 each containing 4 values between -1 and 1. Equivalent to using "def" in shader assembly.

d3d_set_ps_const_col( constant, col, alpha )
Set a constant to a GM-format colour. Handy shortcut.

d3d_set_ps_conf( conf )
Set PS constants from a predefined configuration. Unset values do not override current ones. Much faster than specifying settings individually.

Vertex Shaders

d3d_vs_create( "assembly" )
Compile vertex shader from assembly source. Returns index for use in other functions. The input (vn) registers are set up as follows:

    Reg     Data             Format
    
    v0:     Position         XYZ
    v3:     Normal           XYZ
    v5:     Diffuse colour   RGBA
    v6:     Specular colour  RGBA
    v7-v14: Texture coords   XY

d3d_vs_destroy( shader )
Free memory used by a shader, destroying it.

d3d_set_vs( shader )
Set current vertex shader or -1 for none. GM's functions are not affected. For the extended primitive functions, these completely replace transform, lighting, colouring and texturing.

d3d_set_vs_const( constant, x, y, z, w )
Set VS constant register. There are 96 4-component registers available.

d3d_set_vs_const_col( constant, col, alpha )
Set constant to GM-format colour.

d3d_set_vs_const_matrix( constant )
Sets four constants as the transposed world*view*projection matrix. You can then use m4x4 oPos,v0,cn in the shader to transform the vertices in keeping with GM's normal behaviour. If you use the d3d_transform functions make sure you call this after them, otherwise the shader transform will be wrong. Ex: 0 would set c0,c1,c2,c3; 4 would set c4,c5,c6,c7.

d3d_set_vs_conf( conf )
Set VS constant registers from a predefined configuration. Only the defined settings are changed.

Textures

d3d_set_tex( stage, tex )
Set texture for this texture stage or -1 for none. Max simultaneous textures determines how many are available. GM overrides stage 0 whenever it draws something.

d3d_set_tex_all( tex )
Set texture for all texture stages. -1 unsets.

d3d_set_tex_int( stage, mode )
Set texture stage interpolation mode. Stage 0 is also controlled by texture_set_interpolation in GM. Use tex_int_ constant. Defaults to tex_int_nearest.
Modes:
tex_int_nearest: No filter. Pixelated.
tex_int_bilinear: Bilinear filter. Smooth.
tex_int_anisotropic: Angle-dependent filter. Currently useless, but will be properly supported in version 1.5.

d3d_set_tex_wrap( stage, xmode, ymode )
Set texture stage wrapping mode. Use tex_wrap_ constant. Works the same as texture_set_repeat(), except there are more options and independent X/Y controls.
Modes:
tex_wrap_normal: Repeats texture.
tex_wrap_mirror: Repeats texture, flipping at repeat boundaries.
tex_wrap_mirroronce: Mirrors once and clamps afterward.
tex_wrap_border: Solid colour/alpha is used if outside 0-1.
tex_wrap_clamp: No wrapping. Edge colour is used if outside 0-1.

d3d_set_tex_border( stage, col, alpha )
Set colour for tex_wrap_border.

d3d_set_tex_conf( conf )
Set texture stages from a predefined configuration. Only stages with settings defined in the conf are changed.

User-defined Configurations

These functions handle groups of settings. It's faster to set everything from a conf simultaneously than to set them all individually.

d3d_conf_ps_create()
Creates new pixel shader configuration and returns its index.

d3d_conf_ps_set( conf, constant, r, g, b, a )
Define PS constant state.

d3d_conf_vs_create()
Creates new vertex shader conf.

d3d_conf_vs_set( conf, constant, x, y, z, w )
Define VS constant state.

d3d_conf_tex_create()
Creates new texture stage conf.

d3d_conf_tex_set( conf, stage, tex, interp, xmode, ymode )
Define texture stage state.

Fog

d3d_set_fog_state( state )
Enable/disable fog without modifying fog settings.

d3d_set_fog_type( fog_type )
Set fog type. Use fog_type_ constants. GM normally uses fog_type_linear, but fog_type_exp is usually better.

d3d_set_fog_density( density )
Exponential fog density. 0-1.

d3d_set_fog_color( col )
d3d_set_fog_start( dist )
d3d_set_fog_end( dist )
Same as usual. You have to use these with exp/exp2 type fog or GM will override the type to linear.

Point Primitives and Sprites

d3d_set_point_size( size )
Set size of pr_pointlist primitives. Points always face the camera and are unaffected by standard rotation/scaling methods.

d3d_set_point_size_min( size )
d3d_set_point_size_max( size )
Set size clamp, useful for scaled points in 3D mode. Defaults are 1 and 64.

d3d_set_point_scale( state )
Enable/disable point scaling. This scales points based on their distance from the camera. You must configure the scaling with d3d_set_point_scale_ceof.

d3d_set_point_scale_coef( coef1, coef2, coef3 )
Configure point scaling formula. Defaults to (1,0,0).
Formula: size * sqrt(1/( ceof1 + (coef2*distancetocamera) + (coef3*sqr(distancetocamera)) ))

d3d_set_point_sprite( state )
Enable/disable texture mapping for point primitives.
NOTE: You must define a texture for the primitive with draw_primitive_begin_texture or similar. Texcoords are ignored. 0,0 -> 1,1 covers the entire point irrespective of your inputs.

Render Control

d3d_set_mask( r, g, b, a )
Enable/disable writing of each colour channel independently. Draw on a surface without affecting its alpha channel, for example.

d3d_set_zwrite( state )
Enable/disable depth buffer writing. Same as above except for depth.

d3d_set_zbias( bias )
Offsets drawing depth, allowing polygons with the same position to be drawn without z-fighting / flickering artifacts. Useful for shadows, decals, etc.
Objects drawn with higher values appear in front. Integer 0-16, 0 disables. 0 by default.

d3d_set_alphatest( value, mode )
Prevent drawing of pixels that fail the given alpha test. Value is 0-255. Use cmp_ constants for mode. Pass -1 as value to disable alpha testing. Any positive value enables. Because it causes less pixels to be drawn, alphatesting can improve performance.
For example, if you set (128,cmp_greaterequal) then only pixels with at least half opacity will be drawn.

d3d_set_ztest( mode )
Prevents drawing of pixels that don't meet the given depth criteria. Comparison is between current depth buffer and pixel depth. Use cmp_ constants. Defaults to cmp_lessequal.

d3d_set_fillmode( mode )
Render only points, wireframes or filled. Use fillmode_ constant. Defaults to fillmode_solid.

d3d_set_normal_auto( state )
Enable/disable automatic normal-vector normalisation. Should solve problems with model lighting when scaling.

Extended Primitives

d3d_primitive_begin_ext( primitive, textured )
Begin drawing an extended primitive. 8192 vertices max. Textured is true or false: you control which textures to use with the d3d_set_tex functions. To draw untextured, set to false and call d3d_set_tex_all(-1).

NOTE: When drawing with texcoords things work a little differently. Obviously I couldn't make a function with 22 arguments, so you have to specify coords separately for each vertex with d3d_vertex_ext_tex.

Drawing without texcoords works the same as usual:

d3d_primitive_begin_ext( pr_trianglestrip, false )
  d3d_vertex_ext()
  d3d_vertex_ext()
  d3d_vertex_ext()
d3d_primitive_end_ext()

But for textured ext primitives you must do this:

d3d_primitive_begin_ext( pr_trianglestrip, true )
  d3d_vertex_ext()
  d3d_vertex_ext_tex()     // Define up to 8 sets of texcoords, one per stage.
  d3d_vertex_ext_next()    // Finalises the current vertex.
  d3d_vertex_ext()
  d3d_vertex_ext_tex()
d3d_primitive_end_ext()    // Draw

Since these functions don't affect the texture stages themselves, you can still draw textured primitives without specifying coordinates regardless of whether "textured" is true or false. This is by design, since you might want to specify the coords in a vertex shader instead.

d3d_vertex_ext( x,y,z, nx,ny,nz, col,alpha, speccol,specalpha )
Position, normal, diffuse/specular colour and alpha. The specular has no effect unless you make use of it in a shader.

d3d_vertex_ext_tex( stage, xtex, ytex )
Set vertex texture coordinates. There are eight sets indexed 0-7, one for each texture stage.

d3d_vertex_ext_next()
When drawing with textures, call when finished with the current vertex to start defining the next one.

d3d_primitive_end_ext()
Draw the primitive.

draw_primitive_begin_ext( primitive, textured )
draw_vertex_ext( x,y, col,alpha, speccol,specalpha )
draw_vertex_ext_tex( stage, xtex, ytex )
draw_vertex_ext_next()
draw_primitive_end_ext()

Same thing in two dimensions.

ABOUT SHADERS

As you can see in my demo, shaders are extremely fast and flexible. You can perform tone mapping, colour control, refraction, blurring, and many other advanced techniques.

This extension supports shader model 1.4. This is relatively limited compared with newer versions but highly compatible, and still much more powerful than GM's default raster effect capabilities. Being nine years old has its advantages; pretty much every GPU on the entire planet supports it, even that crappy old Intel GMA950 in your netbook.

WRITING SHADERS

Shaders are written in assembly language. I'm not going to explain every little thing here, but I'll give you a general overview.

A shader has registers to hold data and instructions which perform various operations on them. Registers are like variables; instructions are like functions. Each register has four values; this can represent an RGBA colour, texture coordinates, or whatever you want.

A pixel shader executes once for every single pixel drawn while it's enabled.

Here's a simple shader. This gives the same output as normal drawing.

ps.1.4           // Declares shader version.  Mandatory for all shaders.
texld r0, t0     // Reads texture stage 0 at coords t0.
mul   r0, r0,v0  // Multiplies texture by drawing colour and stores result in r0.  Done.

This halves the alpha of the texture, ignoring image_blend/image_alpha:

ps.1.4
def   c0, 1,1,1,0.5  // Defines constant register.
texld r0, t0 
mul   r0, r0,c0      // Multiplies constant register by texture.

See the demo GMK for more complex and useful examples. Play around with them - you can't break anything, but you can learn a lot more than by reading the specs alone.

Registers

Temporary registers

r0, r1, r2, r3, r4, r5. Read/write.
Registers which can store the results of instructions. They have a guaranteed range of at least -8 to +8, usually way way larger on modern hardware.

r0 also acts as the output register. When the shader is finished r0 is drawn to the screen.

Constant registers

c0, c1, c2, c3, c4, c5, c6, c7. Read-only.
Holds user-defined values between -1 and 1. Constant registers can be set by the def instruction or from outside the shader with d3d_set_ps_const().

Texture address registers

t0, t1, t2, t3, t4, t5. Read-only.
Holds texture coordinates for the current pixel. Because of how GM works you'll only be using t0, which contains the current texture coordinates for whatever texture is being drawn. You can only use these registers with texture instructions. Note: You can use all of them with the extended primitive functions.

Colour registers

v0, v1. Read only.
Holds the vertex colours for this pixel. 0,0,0,1 is opaque black, 1,1,1,1 is opaque white. For example, when drawing a sprite v0 contains image_blend (RGB) and image_alpha (A). v1 contains the specular colour, only used with extended primitives.

Instructions

Arithmetic Instructions
The arguments for these can be temporary, constant or colour registers.

mov rn, src
The most basic instruction. Copies src to rn.

add rn, src1,src2
Add src1 and src2, write to rn.

sub rn, src1,src2
Subtract scr2 from src1.

mul rn, src1,src2
Multiply src1 by src2.

mad rn, src1,src2, src3
Multiplies src1*src2, then adds src3.

lrp rn, src1, src2,src3
Linear interpolation between src2 and src3 by amount src1. In other words: (src2 + ((src3-src2)*src1)).

cnd rn, src1, src2,src3
if (src1 > 0.5) {rn = src2} else {rn = src3}

cmp rn, src1, src2,src3
if (src1 >= 0) {rn = src2} else {rn = src3}

dp3 rn, src1,src2
Calculates dot product of the RGB components of the two inputs: (src1.r*src2.r)+(src1.g*src2.g)+(src1.b*src2.b). Since the dot is a single value, it writes the same result to all components of the destination register.

dp4 rn, src1,src2
Same as dp3, but calculates the dot of all four components.

Texture Instructions

texld rn, tn
Reads colour data from the associated texture stage at coordinates tn and places the result in rn.
Loading into r0 reads texture stage 0. r1 reads stage 1, r2 stage 2, etc.
r0,t0 is the current texture GM is drawing. Surface, sprite, background, whatever.
In phase 2 you can use rn registers as coordinates.

texcrd rn, tn
Like texld, except it copies the texture coordinates instead of reading the texture. You have to use a .rgb destination writemask with this instruction. (see below)

texkill tn
Prevents the pixel from being drawn if any of tn's RGB components are < 0.
In phase 2 you can use rn registers as coordinates.

texdepth r5
Writes to the zbuffer. I've never used this so you'll have to check the MSDN documentation. See below.

Instruction modifiers

You can modify the results of arithmetic instructions by appending modifiers to them.

_x2, _x4, _x8.
Multiply result by factor. Ex: add_x2 would add then multiply the result by 2.

_d2, _d4, _d8.
Divides result by factor.

_sat
Clamps result into the range 0-1. You can combine this with another modifier, ex: add_x2_sat.

Temporary register modifiers

You can modify the inputs to an arithmetic instruction if they come from a temporary register. These only affect the result, they don't change the register itself.

rn_bias
Subtracts 0.5 from input.

1-rn
Inverts input.

-rn
Negates input.

rn_x2
Multiplies input by 2.

rn_bx2
Subtracts 0.5 and multiplies by 2, in that order. This turns data in the range 0-1 into -1-1.

Destination/source masks

You can decide which channels of the destination register are written to or from. The syntax for this is: rn.rgba.
Any combination is allowed, but the masks must be given in order. Ex: rn.a, rn.rg, rn.ra, rn.gba, etc.
Using the rn.rgba mask gives the same result as not specifying any mask.

Instruction count and order

Instructions must be given in the following order:
Version
Constant
Texture
Arithmetic
Phase
Texture
Arithmetic

You can issue up to 6 texture instructions and 8 arithmetic instructions per phase. Constants are optional. Phase is optional as well; if you don't use it the default phase is 2. You can only access the vn registers and use rn as texture coordinates in phase 2. The phase instruction clears the alpha channel of all rn registers.

One other thing! If you use an instruction only on the RGB channels, you can simultaneously execute an instruction on the alpha channel by co-issuing it. The two then count as a single instruction, increasing performance. + signifies co-issue. Example:

 ps.1.4
 texld r0,     t0
 texld r1,     t0
 mov   r0.rgb, r1
+mov   r1.a,   r0

