Tuesday, May 31, 2011
Friday, May 27, 2011
Second Anniversary
Due to the second anniversary of my russian blog, I have prepared some screenshots:
The code is still far from perfect, images have a lot of artifacts (fireflies), there are numerous thing that I want to develop, debug and polish. But for the first time soft shadows are implemented using the power of Direct3D 11.
The code is still far from perfect, images have a lot of artifacts (fireflies), there are numerous thing that I want to develop, debug and polish. But for the first time soft shadows are implemented using the power of Direct3D 11.
Wednesday, May 18, 2011
Packed Stream Output
There is some inconsistency in D3D 10/11 hardware - input layout stage can fetch data with different formats, but stream output can write to target buffers only 32-bit values (you can find a note about this in the Direct3D 10 programming guide, in the very end of Getting Started with the Stream-Output Stage section). Nevertheless, usually float precision is too excessive for local models, and it is desired to output data with half precision.
SM 5.0 has two functions for float-to-half conversion and vice versa: f32tof16() and f16tof32(). There is a dedicated silicon in DX11-hardware for these operations, so they are first-class API citizens. Also these functions are available under SM 4.0 profile - in that case they are emulated by a series of bit shifts, integer multiplications etc. I stumbled upon implementation of these functions in the OpenGL RedBook: Floating-Point Formats Used in OpenGL (probably, the algorithm implemented according to the IEEE 754-2008 specification for half precision floating-point format). Some time ago I wrote my own conversion functions, that work through lookup table, but with advent of SM 5.0 they are can be thrown away :)
I came up with idea that with SM 5.0 we can pack two floats into one, and stream out from geometry shader (and fetch later with input assembler) 2x less data than normally. Besides, important declaration [maxvertexcount] can be reduced: for example, if previously GS was outputing two vertices, now it will output only one. The main idea is: SO outputs two halves packed into single float, and IA interprets the vertex buffer as R16G16B16A16_FLOAT, so we can easily read each packed vertex.
Here is the code that packs two three-component vectors into one four-component: pck. The fourth component is required because R16G16B16A16_FLOAT format has four components (6-byte three-component formats was never supported by hardware), but we can ignore it when reading from vertex buffer or packing as well.
It is easy to pack until we stream out even number of vertives: 2, 4 and so on. But what if we need to output three vertices (say, triangle)? We can pack first two vertices into float4, third vertex - into .xy components of second float4 and left .zw uninitialized. But with subsequent fetch we will read three half4, and the fourth will belong to the next primitive - an error! And we can't define a stride between primitives in the buffer - no one wants to leave a gaps in the memory.
The solution is simple. Before we were packing four vertices into two, for instance, now we would have to pack three vertices into one:
IA will interpret vertex buffer as series of half4 - thats all.
SM 5.0 has two functions for float-to-half conversion and vice versa: f32tof16() and f16tof32(). There is a dedicated silicon in DX11-hardware for these operations, so they are first-class API citizens. Also these functions are available under SM 4.0 profile - in that case they are emulated by a series of bit shifts, integer multiplications etc. I stumbled upon implementation of these functions in the OpenGL RedBook: Floating-Point Formats Used in OpenGL (probably, the algorithm implemented according to the IEEE 754-2008 specification for half precision floating-point format). Some time ago I wrote my own conversion functions, that work through lookup table, but with advent of SM 5.0 they are can be thrown away :)
I came up with idea that with SM 5.0 we can pack two floats into one, and stream out from geometry shader (and fetch later with input assembler) 2x less data than normally. Besides, important declaration [maxvertexcount] can be reduced: for example, if previously GS was outputing two vertices, now it will output only one. The main idea is: SO outputs two halves packed into single float, and IA interprets the vertex buffer as R16G16B16A16_FLOAT, so we can easily read each packed vertex.
Here is the code that packs two three-component vectors into one four-component: pck. The fourth component is required because R16G16B16A16_FLOAT format has four components (6-byte three-component formats was never supported by hardware), but we can ignore it when reading from vertex buffer or packing as well.
It is easy to pack until we stream out even number of vertives: 2, 4 and so on. But what if we need to output three vertices (say, triangle)? We can pack first two vertices into float4, third vertex - into .xy components of second float4 and left .zw uninitialized. But with subsequent fetch we will read three half4, and the fourth will belong to the next primitive - an error! And we can't define a stride between primitives in the buffer - no one wants to leave a gaps in the memory.
The solution is simple. Before we were packing four vertices into two, for instance, now we would have to pack three vertices into one:
struct gs_out
{
vec4 pos1_pos2 : Data0;
vec2 pos3 : Data1;
};
[maxvertexcount(1)]
gs_main(..., PointStream< gs_out > stream)
{
...
}
IA will interpret vertex buffer as series of half4 - thats all.
Friday, May 6, 2011
T-junction Elimination
I encontered some artifacts when rendering stencil shadows from md2 meshes. As it turned out, even a low-poly models from Quake II are not without bugs (I was hoping that everything will be fine). Apparently this is due to the fact that initially they were not intended for casting the shadows, though one may notice, that they respected the rules of two-manifold geometry.
In general, the converter had to be refined so that it can determine non-adjacent clusters of triangles, eliminate T-junctions, do a search based on coincident vertices, etc. In the end I managed to get around of minor bugs and removed invalid triangles from mesh adjacency.
In general, a number of models does not contain any errors. But for example, the "bitch" lost en entire shoulder (as it turned out, there is some kind of porridge made of triangles).
In general, the converter had to be refined so that it can determine non-adjacent clusters of triangles, eliminate T-junctions, do a search based on coincident vertices, etc. In the end I managed to get around of minor bugs and removed invalid triangles from mesh adjacency.
In general, a number of models does not contain any errors. But for example, the "bitch" lost en entire shoulder (as it turned out, there is some kind of porridge made of triangles).
Subscribe to:
Posts (Atom)