Deferred rendering is now the de-facto standard rendering technique in many modern game engines, hence I think it is important to have this technique demonstrated in an osg code example.
This particular sample adds soft shadows as well as bump mapping into the rendering pipeline. The image files whitemetal_diffuse.jpg and whitemetal_normal.jpg from OpenSceneGraph-Data images folder are required (The OSG_FILE_PATH environment variable must be set correctly)
Two additional osgt models are included with the demo (best to also put them into OpenSceneGraph-Data, I think.
The shaders are currently defined in separate .frag and .vert files.
algorithm consisting of two consequent phases :
- first phase is a GLSL shader performing object culling and LOD picking ( a culling shader ).
Every culled object is represented as GL_POINT in the input osg::Geometry.
The output of the culling shader is a set of object LODs that need to be rendered.
The output is stored in texture buffer objects. No pixel is drawn to the screen
because GL_RASTERIZER_DISCARD mode is used.
- second phase draws osg::Geometry containing merged LODs using glDrawArraysIndirect()
function. Information about quantity of instances to render, its positions and other
parameters is sourced from texture buffer objects filled in the first phase.
The example uses various OpenGL 4.2 features such as texture buffer objects,
atomic counters, image units and functions defined in GL_ARB_shader_image_load_store
extension to achieve its goal and thus will not work on graphic cards with older OpenGL
versions.
The example was tested on Linux and Windows with NVidia 570 and 580 cards.
The tests on AMD cards were not conducted ( due to lack of it ).
The tests were performed using OSG revision 14088.
The main advantages of this rendering method :
- instanced rendering capable of drawing thousands of different objects with
almost no CPU intervention ( cull and draw times are close to 0 ms ).
- input objects may be sourced from any OSG graph ( for example - information about
object points may be stored in a PagedLOD graph. This way we may cover the whole
countries with trees, buildings and other objects ).
Furthermore if we create osgDB plugins that generate data on the fly, we may
generate information for every grass blade for that country.
- every object may have its own parameters and thus may be distinct from other objects
of the same type.
- relatively low memory footprint ( single object information is stored in a few
vertex attributes ).
- no GPU->CPU roundtrip typical for such methods ( method uses atomic counters
and glDrawArraysIndirect() function instead of OpenGL queries. This way
information about quantity of rendered objects never goes back to CPU.
The typical GPU->CPU roundtrip cost is about 2 ms ).
- this example also shows how to render dynamic objects ( objects that may change
its position ) with moving parts ( like car wheels or airplane propellers ) .
The obvious extension to that dynamic method would be the animated crowd rendering.
- rendered objects may be easily replaced ( there is no need to process the whole
OSG graphs, because these graphs store only positional information ).
The main disadvantages of a method :
- the maximum quantity of objects to render must be known beforehand
( because texture buffer objects holding data between phases have constant size ).
- OSG statistics are flawed ( they don't know anymore how many objects are drawn ).
- osgUtil::Intersection does not work
Example application may be used to make some performance tests, so below you
will find some extended parameter description :
--skip-dynamic - skip rendering of dynamic objects if you only want to
observe static object statistics
--skip-static - the same for static objects
--dynamic-area-size - size of the area for dynamic rendering. Default = 1000 meters
( square 1000m x 1000m ). Along with density defines
how many dynamic objects is there in the example.
--static-area-size - the same for static objects. Default = 2000 meters
( square 2000m x 2000m ).
Example application defines some parameters (density, LOD ranges, object's triangle count).
You may manipulate its values using below described modifiers:
--density-modifier - density modifier in percent. Default = 100%.
Density ( along with LOD ranges ) defines maximum
quantity of rendered objects. registerType() function
accepts maximum density ( in objects per square kilometer )
as its parameter.
--lod-modifier - defines the LOD ranges. Default = 100%.
--triangle-modifier - defines the number of triangles in finally rendered objects.
Default = 100 %.
--instances-per-cell - for static rendering the application builds OSG graph using
InstanceCell class ( this class is a modified version of Cell class
from osgforest example - it builds simple quadtree from a list
of static instances ). This parameter defines maximum number
of instances in a single osg::Group in quadtree.
If, for example, you modify it to value=100, you will see
really big cull time in OSG statistics ( because resulting
tree generated by InstanceCell will be very deep ).
Default value = 4096 .
--export-objects - write object geometries and quadtree of instances to osgt files
for later analysis.
--use-multi-draw - use glMultiDrawArraysIndirect() instead of glDrawArraysIndirect() in a
draw shader. Thanks to this we may render all ( different ) objects
using only one draw call. Requires OpenGL version 4.3 and some more
work from me, because now it does not work ( probably I implemented
it wrong, or Windows NVidia driver has errors, because it hangs
the apllication at the moment ).
This application is inspired by Daniel Rákos work : "GPU based dynamic geometry LOD" that
may be found under this address : http://rastergrid.com/blog/2010/10/gpu-based-dynamic-geometry-lod/
There are however some differences :
- Daniel Rákos uses GL queries to count objects to render, while this example
uses atomic counters ( no GPU->CPU roundtrip )
- this example does not use transform feedback buffers to store intermediate data
( it uses texture buffer objects instead ).
- I use only the vertex shader to cull objects, whereas Daniel Rákos uses vertex shader
and geometry shader ( because only geometry shader can send more than one primitive
to transform feedback buffers ).
- objects in the example are drawn using glDrawArraysIndirect() function,
instead of glDrawElementsInstanced().
Finally there are some things to consider/discuss :
- the whole algorithm exploits nice OpenGL feature that any GL buffer
may be bound as any type of buffer ( in our example a buffer is once bound
as a texture buffer object, and later is bound as GL_DRAW_INDIRECT_BUFFER ).
osg::TextureBuffer class has one handy method to do that trick ( bindBufferAs() ),
and new primitive sets use osg::TextureBuffer as input.
For now I added new primitive sets to example ( DrawArraysIndirect and
MultiDrawArraysIndirect defined in examples/osggpucull/DrawIndirectPrimitiveSet.h ),
but if Robert will accept its current implementations ( I mean - primitive
sets that have osg::TextureBuffer in constructor ), I may add it to
osg/include/PrimitiveSet header.
- I used BufferTemplate class writen and published by Aurelien in submission forum
some time ago. For some reason this class never got into osg/include, but is
really needed during creation of UBOs, TBOs, and possibly SSBOs in the future.
I added std::vector specialization to that template class.
- I needed to create similar osg::Geometries with variable number of vertices
( to create different LODs in my example ). For this reason I've written
some code allowing me to create osg::Geometries from osg::Shape descendants.
This code may be found in ShapeToGeometry.* files. Examples of use are in
osggpucull.cpp . The question is : should this code stay in example, or should
it be moved to osgUtil ?
- this remark is important for NVidia cards on Linux and Windows : if
you have "Sync to VBlank" turned ON in nvidia-settings and you want to see
real GPU times in OSG statistics window, you must set the power management
settings to "Prefer maximum performance", because when "Adaptive mode" is used,
the graphic card's clock may be slowed down by the driver during program execution
( On Linux when OpenGL application starts in adaptive mode, clock should work
as fast as possible, but after one minute of program execution, the clock slows down ).
This happens when GPU time in OSG statistics window is shorter than 3 ms.
"
git-svn-id: http://svn.openscenegraph.org/osg/OpenSceneGraph/trunk@14531 16af8721-9629-0410-8352-f15c8da7e697
To select standard OpenGL 1/2 build with full backwards and forwards comtability use:
./configure
make
OR
./configure -DOPENGL_PROFILE=GL2
To select OpenGL 3 core profile build using GL3/gl3.h header:
./configure -DOPENGL_PROFILE=GL3
To select OpenGL Arb core profile build using GL/glcorearb.h header:
./configure -DOPENGL_PROFILE=GLCORE
To select OpenGL ES 1.1 profile use:
./configure -DOPENGL_PROFILE=GLES1
To select OpenGL ES 2 profile use:
./configure -DOPENGL_PROFILE=GLES2
Using OPENGL_PROFILE will select all the appropriate features required so no other settings in cmake will need to be adjusted.
The new configuration options are stored in the include/osg/OpenGL header that deprecates the old include/osg/GL header.
- add non square matrix
- add double
- add all uniform type available in OpenGL 4.2
- backward compatibility for Matrixd to set/get an float uniform matrix
- update of IVE / Wrapper ReadWriter
implementation of AtomicCounterBuffer based on BufferIndexBinding
add example that use AtomicCounterBuffer and show rendering order of fragments,
original idea from geeks3d.com."
osgshaders.cpp and demonstrates the use of GLSL vertex and fragment
shaders with a simple animation callback. I found the osgshaders.cpp
too complex to serve as a starting point for GLSL programming"
10.6), which will forward all multi-touch events from a trackpad to the
corresponding osgGA-event-structures.
The support is switched off per default, but you can enable multi-touch
support via a new flag for GraphicsWindowCocoa::WindowData or directly
via the GraphicsWindowCocoa-class.
After switching multi-touch-support on, all mouse-events from the
trackpad get ignored, otherwise you'll have multiple events for the same
pointer which is very confusing (as the trackpad reports absolute
movement, and as a mouse relative movement).
I think this is not a problem, as multi-touch-input is a completely
different beast as a mouse, so you'll have to code your own
event-handlers anyway.
While coding this stuff, I asked myself if we should refactor
GUIEventAdapter/EventQueue and assign a specific event-type for
touch-input instead of using PUSH/DRAG/RELEASE. This will make it
clearer how to use the code, but will break the mouse-emulation for the
first touch-point and with that all existing manipulators. What do you
think? I am happy to code the proposed changes.
Additionally I created a small (and ugly) example osgmultitouch which
makes use of the osgGA::MultiTouchTrackballManipulator, shows all
touch-points on a HUD and demonstrates how to get the touchpoints from
an osgGA::GUIEventAdapter.
There's even a small example video here: http://vimeo.com/31611842"
"- In order to build against GLES1 we execute:
$ mkdir build_android_gles1
$ cd build_android_gles1
$ cmake .. -DOSG_BUILD_PLATFORM_ANDROID=ON -DDYNAMIC_OPENTHREADS=OFF
-DDYNAMIC_OPENSCENEGRAPH=OFF -DANDROID_NDK=<path_to_android_ndk>/
-DOSG_GLES1_AVAILABLE=ON -DOSG_GL1_AVAILABLE=OFF
-DOSG_GL2_AVAILABLE=OFF -DOSG_GL_DISPLAYLISTS_AVAILABLE=OFF -DJ=2
-DOSG_CPP_EXCEPTIONS_AVAILABLE=OFF
$ make
If all is correct you will have and static OSG inside:
build_android_gles1/bin/ndk/local/armeabi.
- GLES2 is not tested/proved, but I think it could be possible build
it with the correct cmake flags.
- The flag -DJ=2 is used to pass to the ndk-build the number of
processors to speed up the building.
- make install is not yet supported."
contributed my testcase/demo for the original implementation.
This attached change is similar to osgtext but uses the QFontImplementation in
a Qt based viewer.
With that, it should be easier for all of us to test changes in
qfontimplementation"
I uses a queue of Camera objects to do offscreen rendering with the Camera::attach() function. The entire picture is split into many tiles and it will take a few seconds while attaching and detaching cameras with tiles. You may select to output every tile as an image file, or combine them together to create a large poster, for example, a 12800 x 9600 image.
Start the program like this:
./osgposter --output-poster --poster output.bmp --tilesize 800 600 --finalsize 8000 6000 cow.osg
Adjust the scene camera to a suitable position and press 'p' or 'P' on the keyboard. Wait until sub-cameras dispatching is finished. And the poster file will be created while closing window. A 8000 x 6000 output.bmp will be created to show a fine-printed cow. :)
The command below may also help:
./osgposter --help
"
changes from the DirectInput devices and add events to the event
queue. I've tested with the keyboard and joystick supports. Because of
only having a very old 6-button gamepad, I can't do more experiments.
Hope this will bring more ideas to those who face similar problems,
especially simulation game designers. :)
I didn't map all DirectInput key values to GUIEventAdapter key
symbols. Users may add more in the buildKeyMap() function freely. The
mouse handling operations are also ignored, but will be easily
improved in the same way of creating keyboard and joystick devices.
Please add a line:
FIND_PACKAGE(DirectInput)
in the CMakeLists of root directory. And in the examples/CMakeLists.txt:
IF(DIRECTINPUT_FOUND)
ADD_SUBDIRECTORY(osgdirectinput)
ENDIF(DIRECTINPUT_FOUND)
DirectX SDK 2009 is used here, but an older version like DX8 should
also work in my opinion.
"
A few things remain to do:
* The binding between a uniform block in a shader program and a buffer indexed target number is fixed, like a vertex attribute binding. This is too restrictive because that binding can be changed without relinking the program. This mapping should be done by name in the same way that uniform values are handled i.e., like a pseudo state attribute;
* There's no direct way yet to query for the offset of uniforms in uniform block, so only the std140 layout is really usable. A helper class that implemented the std140 rules would be quite helpful for setting up uniform blocks without having to link a program first;
* There's no direct support for querying parameters such as the maximum block length, minimum offset alignment, etc. Having that information available outside of the draw thread would make certain instancing techniques easier to implement."
attached you'll find the second part of the IOS-submission. It contains
* GraphicsWindowIOS, which supports external and "retina" displays,
multisample-buffers (for IOS > 4.0) and multi-touch-events
* an ios-specific implementation of the imageio-plugin
* an iphone-viewer example
* cMake support for creating a xcode-project
* an updated ReadMe-file describing the necessary steps to get a
working xcode-project-file from CMake
Please credit Thomas Hogarth and Stephan Huber for these changes.
This brings the ios-support in line with the git-fork on github. It
needs some more testing and some more love, the cmake-process is still a
little complicated.
You'll need a special version of the freetype lib compiled for IOS,
there's one bundled in the OpenFrameworks-distribution, which can be used."
Notes, from Robert Osfield, modified CMakeLists.txt files so that the IOS specific paths are within IF(APPLE) blocks.
Also I've done the osguserstats example. I've kept the "toy example" that was in the modified osgviewer.cpp I had sent you, because they show different uses of custom stats lines (a value displayed directly, a value without bars and a value with bars and graph). I also added a function and a thread that will sleep for a given number of milliseconds and record this time in the stats. I think it clearly shows how to record the time some processing takes and add that to the stats graph, whether the processing takes place on the same thread as the viewer or on another thread.
BTW, feel free to modify the colors I've given to each user stats line... I'm not very artistic. :-)
I've also added more doc comments to the addUserStats() method in ViewerEventHandlers, so hopefully the arguments are clear and the way to get the results you want is also clear. Maybe I went overboard, but the function makes some assumptions that may not be obvious and has many arguments, so I preferred to be explicit."
changed extensions from .c to .cpp and got compiling as C files as part of the osg core library.
Updated and cleaned up the rest of the OSG to use the new internal GLU.
type is supported at present. The attached osgparticleshader.cpp will
show how it works. It can also be placed in the examples folder. But I
just wonder how this example co-exists with another two (osgparticle
and osgparticleeffect)?
Member variables in Particle, including _alive, _current_size and
_current_alpha, are now merged into one Vec3 variable. Then we can
make use of the set...Pointer() methods to treat them as vertex
attribtues in GLSL. User interfaces are not changed.
Additional methods of ParticleSystem are introduced, including
setDefaultAttributesUsingShaders(), setSortMode() and
setVisibilityDistance(). You can see how they work in
osgparticleshader.cpp.
Additional user-defined particle type is introduced. Set the particle
type to USER and attach a drawable to the template. Be careful because
of possible huge memory consumption. It is highly suggested to use
display lists here.
The ParticleSystemUpdater can accepts ParticleSystem objects as child
drawables now. I myself think it is a little simpler in structure,
than creating a new geode for each particle system. Of course, the
latter is still compatible, and can be used to transform entire
particles in the world.
New particle operators: bounce, sink, damping, orbit and explosion.
The bounce and sink opeartors both use a concept of domains, and can
simulate a very basic collision of particles and objects.
New composite placer. It contains a set of placers and emit particles
from them randomly. The added virtual method size() of each placer
will help determine the probability of generating.
New virtual method operateParticles() for the Operator class. It
actually calls operate() for each particle, but can be overrode to use
speedup techniques like SSE, or even shaders in the future.
Partly fix a floating error of 'delta time' in emitter, program and
updaters. Previously they keep the _t0 variable seperately and compute
different copies of dt by themseleves, which makes some operators,
especially the BounceOperator, work incorrectly (because the dt in
operators and updaters are slightly different). Now a getDeltaTime()
method is maintained in ParticleSystem, and will return the unique dt
value (passing by reference) for use. This makes thing better, but
still very few unexpected behavours at present...
All dotosg and serialzier wrappers for functionalities above are provided.
...
According to some simple tests, the new shader support is slightly
efficient than ordinary glBegin()/end(). That means, I haven't got a
big improvement at present. I think the bottlenack here seems to be
the cull traversal time. Because operators go through the particle
list again and again (for example, the fountain in the shader example
requires 4 operators working all the time).
A really ideal solution here is to implement the particle operators in
shaders, too, and copy the results back to particle attributes. The
concept of GPGPU is good for implementing this. But in my opinion, the
Camera class seems to be too heavy for realizing such functionality in
a particle system. Myabe a light-weight ComputeDrawable class is
enough for receiving data as textures and outputting the results to
the FBO render buffer. What do you think then?
The floating error of emitters
(http://lists.openscenegraph.org/pipermail/osg-users-openscenegraph.org/2009-May/028435.html)
is not solved this time. But what I think is worth testing is that we
could directly compute the node path from the emitter to the particle
system rather than multiplying the worldToLocal and LocalToWorld
matrices. I'll try this idea later.
"
short oit. This rendering technique is also known as depth peeling.
Attached is the example that makes depth peeling work with the fixed function
pipeline. Ok, this is 'old fashioned' but required for our use case that
still has to work on older UNIX OpenGL implementations as well as together
with a whole existing application making use of the fixed function pipeline.
I can imagine to add support for shaders when we have that shader composition
framework where we can add a second depth test in a generic way.
This does *not* implement the dual depth peeling described in a paper from the
ETH Zurich.
This example could serve as a test case for the feature that you can on the
fly remove pre render cameras that you made work a few time ago.
It is also a test case for the new TraversalOrderBin that is used to composite
the depth layers in the correct blend order.
This example also stresses your new texture object cache since you can change
some parameters for the oit implementation at runtime.
You can just load any model with osgoit and see how it works.
Use the usual help key to see what you can change.
There is already an osgdepthpeeling example that I could not really make sense
of up to now. So I just made something new without touching what I do not
understand."