One more HLSL trick

07/07/2011

When implementing a quite complex pixel shader on pixel shader for multi-cascade shadows maps on shader model 2.0, I’d got a usual error message saying that “Arithmetic instruction limit of 64 exceeded“, so once again I have to find an instruction slot.

I had a code that tries to find the best shadow map for current pixel, something like:

//float2 map[] is a u/v pair; if it is "inside" the texture, I can use it
if (map[0].x < 0 || map[0].x > 1 ||
    map[0].y < 0 || map[0].y > 1)
{
  //...
}

This code was converted to:

//float2 map[] is a u/v pair; if it is "inside" the texture, I can use it
if( any (map[0]-saturate(map[0] ) ) )
{
  //...
}

This takes much less instructions, so now I have some free slots!

-1.#IND000000000000

05/11/2011

Q: In Visual Studio, how to set conditional breakpoint on -1.#IND000000000000 ?

 vairable == -1.#IND000000000000//this cannot work, of course

A:

vairable != vairable

More productive programming

04/20/2011

(inspired by 10 best tricks of fooling myself to work and Create more productive environment at your desk)

  1. Drink a good tea constantly.
    it helps to concentrate. I recommend Pu-erh or Oolong)
  2. Drink a your tea with no sugar.
    not only it tastes much better (this is the only reasonable way to drink a tea), but also when it is accidentally  spilled on a keyboard, the keyboard is recoverable, and the keys do not stick. Thus you will be significantly more productive than somebody who drinks tea/coffee  with sugar (or Pepsi or whatever)
  3. Keep your browser closed; get updates using RSS readers.
    I personally had been spending  a lot of time  checking news sites/ /. / HN /Merriam-Webster’s Word of the Day/German word of the day/ Wikipedia watch list  etc.  each time “the code is compiling”, and then continue to surf. Now RSS saves me a lot of time.
  4. Ask a co-worker for help, if you are stuck on a problem more than 30 minutes.
    usually, you will find the solution even before you finish explaining him the issue. If not, you give him a chance to solve your problem and fill genius.
  5. Use Ginkgo biloba, if you still feel dumb.
    I don’t know whether it’ effect is fully Psychosomatic or not, but it helps.

HLSL trick #2

04/14/2011

Saving some expensive instruction in shader model 2…
(Trick #1 is described here)

Instead of, e. g.,

float getTotalDiffuse()
{
 float l1 = getDiffuse(g_light1);
 float l2 = getDiffuse(g_light2);
 float l3 = getDiffuse(g_light3);
 float l4 = getDiffuse(g_light4);
 return l1+l2+l3+l4;
}

try

float getTotalDiffuse()
{
 float4 l = {getDiffuse(g_light1),
              getDiffuse(g_light2),
              getDiffuse(g_light3),
              getDiffuse(g_light4)};
 return dot (l, float4(1,1,1,1));
}

The trick is that sometimes if you want to sum some floats (or vectors), it may be cheaper to use dot product with vector (1, … , 1)

HLSL trick

03/02/2011

If you write a HLSL Shader Model 2 you know you are limited to 32 const registers. Here is a trick that helped me to save one.

Suppose you want to write in pixel shader:

 float4x4 l_Color=
   {
      tex2D(samTex0, In.Tex0),
      tex2D(samTex1, In.Tex1),
      tex2D(samTex2, In.Tex2),
      tex2D(samTex3, In.Tex3)
   };
 ...
   l_Color[g_BlendLayer] = foo(l_Color[g_BlendLayer]); 
   //OOPS! illegal syntax
   out.Color = combine (Color);

But you cannot write in ShaderModel 2 “array[index] = blah“.

Second try:

   ...
  float4 blended = foo(l_Color[g_BlendLayer]);
   switch (BlendLayer)
   {
      case 0: l_Color[0] = blended; break;
      case 1: l_Color[1] = blended; break;
      case 2: l_Color[2] = blended; break;
      case 3: l_Color[3] = blended; break;
   }
   //OOPS! illegal syntax
   ...

Of course, there is no switch/case in ShaderModel 2…

   ...
  float4 blended = foo(l_Color[g_BlendLayer]);
   switch (BlendLayer)
   {
      case 0: l_Color[0] = blended; break;
      case 1: l_Color[1] = blended; break;
      case 2: l_Color[2] = blended; break;
      case 3: l_Color[3] = blended; break;
   }
   //OOPS! illegal syntax
   ...

Same problem: “array[index] = blah“.

   ...
  float4 blended = foo(l_Color[g_BlendLayer]);
  if (0 == BlendLayer) l_Color[0] = blended; else
  if (1 == BlendLayer) l_Color[1] = blended; else
  if (2 == BlendLayer) l_Color[2] = blended; else
  if (3 == BlendLayer) l_Color[3] = blended; 
   //OOPS! Too bad
   ...

Although this code is syntactically correct, it is wrong: condition are bad, and nested ifs are even worse (just look on disassembly). Too much instructions.

   ...
  float4 blended = foo(l_Color[g_BlendLayer]);
  if (0 == BlendLayer) l_Color[0] = blended; 
  if (1 == BlendLayer) l_Color[1] = blended; 
  if (2 == BlendLayer) l_Color[2] = blended; 
  if (3 == BlendLayer) l_Color[3] = blended; 
   //Much better!
   ...

This code (same as previous, but without “else”s, is much better. But still, I get “error X5589: Invalid const register num: 32. Max allowed is 31.”

   ...
  float4 blended = foo(l_Color[g_BlendLayer]);
  if (0 == BlendLayer--) l_Color[0] = blended; 
  if (0 == BlendLayer--) l_Color[1] = blended; 
  if (0 == BlendLayer--) l_Color[2] = blended; 
  if (0 == BlendLayer--) l_Color[3] = blended; 
   //It works!
   ...

Learning the assembly code really helps.
Here is how you get .asm files from your HLSL shaders,
using Microsoft’s fxc tool:

fxc /Gfp /Zi /T ps_2_0 /Fc out.asm l:\efx\mini.fx /E PS_Test

or, to get nice HTML:

fxc /Gfp /Zi /T ps_2_0 /Cc /Fc out.asm.html l:\efx\mini.fx /E PS_Test

words….

02/21/2011

unordered facts.

Etymology of etymology: Greek ἐτυμολογία (etumologíā); from ἔτυμον (étumon), meaning “true sense”, and -λογία (-logía), meaning “study”; from λόγος (lógos), meaning “speech, account, reason.” (wikipedia)

Pronunciation of pronunciation is IPA: /pɹəˌnʌnsiˈeɪʃən/, SAMPA: /pr@%nVnsi”eIS@n/ (wiktionary)

Doublet for doublet is twin.

The term antonym is synonymous with opposite.

Antonym to antonym is synonym.

There is no synonym to synonym, AFAIK.

TLA is TLA. (Three-letter acronym)

FLAB is FLAB (Four-letter abbreviation)

Onomatopoeic is not onomatopeic.

Awkward is an awkward word.

RAS syndrome is an example of RAS syndrome.

Portmanteau is Portmanteau.

hm.

heterological is not heterological nor autological.


Puzzle

08/24/2010

Playing with JavaScript.

There are ten matchsticks. You must move the matches such that there are 5 crossed pairs of them. Each turn you can move one matchstick over exactly two other matchsticks. (crossed matchsticks are considered as two!)
10-matchsticks puzzle

999…

02/08/2010

Question

Let N = any number not divisible by 2 and 5.
Does there exist a k (for each such N), such that 10^k – 1 is divisible by N?
Or: Is there 99..9 for any N, such as 99..9 is divisible by N, if N is coprime with 10?

Answer

Yes.  It is multiplicative order of 10 modulo N. The sequence is can be found at The On-Line Encyclopedia of Integer Sequences.

Proof

Trivial.

Code

-- all numbers than cannot be devided by 2 or 5
seq1 :: [Integer]
seq1 = filter (\a->(a `mod` 10) `elem` [1,3,7,9]) [1..]

-- find 99..9 that can be devided by n
findNum::Integer->Integer
findNum n = head $ [x | x<-[1..], (10^x-1) `mod` n == 0]

--prints the sequene
take 100 $ map findNum3 seq2

The Treachery of Computer Images

01/10/2010
The Treachery of Computer Images

This is not a painting

Hello, World! program on HLSL

12/17/2009

If someone is missing real “Hello, World” program written in a shader language, here it is.

Pixel shader HLSL, shader model 3.0  (in effects file); pixel shader only, that takes as input only u,v.

More serious challenge could be to write Quine program, but I have other things to do.

const int L[18]={
	0xad27,	0xa925,	0xED25,	0xa925,	0xaDB7,	0x0000,
	0x85eE,	0x852b,	0xad2b,	0xf92e,	0x712b,	0x51e9,	0x0000,
	0xC3CF,	0xc36f,	0xc326,	0xc360,	0xf3cf
};
float4 PS(float2 tex : TEXCOORD0):COLOR0
{
	float4 output = float4(0,tex.x,tex.y,1);
	int x= (1-tex.x)*16;
	int y= (1-tex.y)*18;
	//no bitwise ioerations on shadermodel 3 yet :(
	int mask=L[y]*2;
	for (int i = 0; i<16&& i <x; i++)
		mask*=0.5;output.r = ( frac(0.5*mask) < 0.1);
	return output;
}