--------------------------------------------
Descripton:
The patch does provide a new class PixelDataBufferObject which is capable of allocating memory on the GPU side (PBO memory) of arbitrary size. The memory can then further be used to be enabled into read mode (GL_PIXEL_UNPACK_BUFFER_ARB) or in write mode (GL_PIXEL_PACK_BUFFER_ARB). Enabling the buffer into write mode will force the driver to write data from bounded textures into that buffer (i.e. glGetTexImage). Using buffer in read mode give you the possibility to read data from the buffer into a texture with e.g. glTexSubImage or other instuctions. Hence no data is copied over the CPU (host memory), all the operations are done in the GPU memory.
--------------------------------------------
Compatibility:
The new class require the unbindBuffer method from the base class BufferObject to be virtual, which shouldn't break any functionality of already existing classes. Except of this the new class is fully orthogonal to existing one, hence can be safely added into already existing osg system.
--------------------------------------------
Testing:
The new class was tested in the current svn version of osgPPU. I am using the new class to copy data from textures into the PBO and hence provide them to CUDA kernels. Also reading the results back from CUDA is implemented using the provided patch. The given patch gives a possibility of easy interoperability between CUDA and osg (osgPPU ;) )
--------------------------------------------
I think in general it is a better way to derive the PixelBufferObject class from PixelDataBufferObject, since the second one is a generalization of the first one. However this could break the current functionality, hence I haven't implemented it in such a way. However I would push that on a stack of wished osg 3.x features, since this will reflect the OpenGL PBO functionality through the classes better.
"
"To reproduce the bug:
1. Create a template osg::Sequence node (and underlying geometry) but do not attach the node to the current active scenegraph.
2. At some point during the rendering loop (perhaps on a keystroke) clone the sequence node (I use the call:
dynamic_cast<osg::Node*>(templateNode -> clone( osg::CopyOp( (osg::CopyOp::CopyFlags)osg::CopyOp::DEEP_COPY_NODES ) ) )
3. Set the cloned sequence node duration to a value that makes the animation run slower (i.e. 2.0).
4. Start the cloned sequence (using setMode()).
5. Repeat steps 2 \u2013 4 and observe that the cloned sequences do not run slow but run as fast, appearing to ignore the duration that has been set on them.
Looking at the \u2018good documentation\u2019 (2.4 source code), I see that _start is being set to _now (osg::Sequence::setMode(), line 192). Should this not _start not be set to -1.0?"
* When used PDS RenderStage::runCameraSetUp sets flag that FBO has already stencil,depth buffer attached. Prevents adding next depth buffer.
* Sets correct traits for p-buffer if used PDS and something goes wrong with FBO setup or p-buffer is used directly.
* Adds warning to camera if user add depth/stencil already attached through PDS.
* Sets blitMask when use blit to resolve buffer.
There is also new example with using multisampled FBO."
I have not reverted added Compiler options. I assume that one may want to have warnings enabled for the application but may not want to see them while OSG libraries and examples compile.
Modified files:
osg/Export - now explicitly includes osg/Config to make sure OSG_DISABLE_MSVC_WARNINGS is read
osg/Config.in - declares OSG_DISABLE_MSVC_WARNINGS flag to be added to autogenerated osg/Config
CMakeLists.txt - declares OSG_DISABLE_MSVC_WARNINGS as option with default ON setting
"
The fix was to check whether glGetString( GL_VERSION ) returned a null pointer (Ref. svn diff below). The altered src/osg/GLExtensions.cpp is zipped and attached to this email."
set.
The optimization is based on the observation that matrix matrix multiplication
with a dense matrix 4x4 is 4^3 Operations whereas multiplication with a
transform, or scale matrix is only 4^2 operations. Which is a gain of a
*FACTOR*4* for these special cases.
The change implements these special cases, provides a unit test for these
implementation and converts uses of the expensiver dense matrix matrix
routine with the specialized versions.
Depending on the transform nodes in the scenegraph this change gives a
noticable improovement.
For example the osgforest code using the MatrixTransform is about 20% slower
than the same codepath using the PositionAttitudeTransform instead of the
MatrixTransform with this patch applied.
If I remember right, the sse type optimizations did *not* provide a factor 4
improovement. Also these changes are totally independent of any cpu or
instruction set architecture. So I would prefer to have this current kind of
change instead of some hand coded and cpu dependent assembly stuff. If we
need that hand tuned stuff, these can go on top of this changes which must
provide than hand optimized additional variants for the specialized versions
to give a even better result in the end.
An other change included here is a change to rotation matrix from quaterion
code. There is a sqrt call which couold be optimized away. Since we divide in
effect by sqrt(length)*sqrt(length) which is just length ...
"
they are written to disk, either inline or as an external file. Added support for
this in the .ive plugin. Default of WriteHint is NO_PREFERNCE, in which case it's
up to the reader/writer to decide.
occlusion query result is not ready for retrieval when the app tries to
retrieve it. This fix adds an application-level wait loop to ensure the
result is ready for retrieval. This code is not compiled by default; add "-D
FORCE_QUERY_RESULT_AVAILABLE_BEFORE_RETRIEVAL" to get this code.
Full, gory details, to the best of my recollection:
The conditions under which we encountered this issue are as follows: 64-bit
processor, Mac/Linux OS, multiple NVIDIA GPUs, multiple concurrent draw
threads, VRJuggler/SceneView-based viewer, and a scene graph containing
OcclusionQueryNodes. Todd wrote a small test program that produces an almost
instant crash in this environment. We verified the crash does not occur in a
similar environment with a 32-bit processor, but we have not yet tested on
Windows and have not yet tested with osgViewer.
The OpenGL spec states clearly that, if an occlusion query result is not yet
ready, an app can go ahead and attempt to retrieve it, and OpenGL will
simply block until the result is ready. Indeed, this is how
OcclusionQueryNode is written, and this has worked fine on several platforms
and configurations until Todd's test program.
By trial and error and dumb luck, we were able to workaround the crash by
inserting a wait loop that forces the app to only retrieve the query after
OpenGL says it is available. As this should not be required (OpenGL should
do this implicitly, and more efficiently), the wait loop code is not
compiled by default. Developers requiring this work around must explicitly
add "-D FORCE_QUERY_RESULT_AVAILABLE_BEFORE_RETRIEVAL" to the compile
options to include the wait loop."
New attribute DatabasePager::_expiryFrames sets number of frames a PagedLOD child is kept in memory. The attribute is set with DatabasePager::setExpiryFrames method or OSG_EXPIRY_FRAMES environmental variable.
New attribute PagedLOD::PerRangeData::_
frameNumber contains frame number of last cull traversal.
Children of PagedLOD are expired when time _AND_ number of frames since last cull traversal exceed OSG_EXPIRY_DELAY _AND_ OSG_EXPIRY_FRAMES respectively. By default OSG_EXPIRY_FRAMES = 1 which means that nodes from last cull/rendering
traversal will not be expired even if last cull time exceeds OSG_EXPIRY_DELAY. Setting OSG_EXPIRY_FRAMES = 0 revokes previous behaviour of PagedLOD.
Setting OSG_EXPIRY_FRAMES > 0 fixes problems of children reloading in lazy rendering applications. Required behaviour is achieved by manipulating OSG_EXPIRY_DELAY and OSG_EXPIRY_FRAMES together.
Two interface changes are made:
DatabasePager::updateSceneGraph(double currentFrameTime) is replaced by DatabasePager::updateSceneGraph(const osg::FrameStamp &frameStamp). The previous method is in #if 0 clause in the header file. Robert, decide if You want to include it.
PagedLOD::removeExpiredChildren(double expiryTime, NodeList &removedChildren) is deprecated (warning is printed), when subclassing use PagedLOD::removeExpiredChildren(double expiryTime, int expiryFrame, NodeList &removedChildren) instead. "
/users/mvalle/OSG/OpenSceneGraph/src/osg/BufferObject.cpp: In member function `virtual void osg::ElementBufferObject::compileBuffer(osg::State&) const':
/users/mvalle/OSG/OpenSceneGraph/src/osg/BufferObject.cpp:600: warning: cast to pointer from integer of different size"
MemoryBarrier() is used in the implementation, so it should be checked.
This in effect disables the faster atomic ops on msvc 7.1 and older, even if
only the MemoryBarrier() call is missing. But it ensures for the fist cut
that it will build everywhere. If somebody cares for msvc 7.1 enough and has
one for testing installed, he might provide the apropriate defines to guard
that MemoryBarrier() call.
I tested that msvc8 32/64bit still passes the configure tests and compiles.
"
implementation of the atomic increment and decrement into a implementation
file.
This way inlining and compiler optimization can no longer happen for these
implementations, but it fixes compilation on win32 msvc targets. I expect
that this is still faster than with with mutexes.
Also the i386 gcc target gets atomic operations with this patch. By using an
implementation file we can guarantee that we have the right compiler flags
available."
via StateSet::setNestedRenderBin(bool) whether the new RenderBin should be nested
with the existing RenderBin, or be nested with the enclosing RenderStage.
I added a call to allocateDataArray() if rhs has (at least) one valid array, which should allocate the right array according to the type. Since the type was copied from rhs, it should create the same array as rhs has, so then it should copy the data in the following lines.
"
is based on suggested fix from Marco for fixing a crash due to lack of
thread safety in std::ofstream("/dev/null"); The fix is to use a custom stream
buffer that just discards all data. The implementation is also twice as fast
as the old /dev/null based approach.
nvidia 6 months ago and the issue is still "in progress". I've given up
waiting for them!
Platform - various Intel Windows XP SP2 PCs with various nvidia cards
including GeForce 8800 GTS and Quadro FX 4500, and various driver
versions including the latest WHQL 175.16.
I investigated your concerns about glGenerateMipmapEXT being slower than
GL_GENERATE_MIPMAP_SGIS, and for power-of-two textures, to my surprise
it is. For a 512*512 texture, glGenerateMipmapEXT takes on average 10ms,
while GL_GENERATE_MIPMAP_SGIS takes on average 6ms. Therefore I have
modified the code to only use glGenerateMipmapEXT if the texture has a
non-power-of-two width or height. I am resubmitting all the files
previously submitted (only "Texture.cpp" has significant changes since
my previous submission, I've also replaced tabs with spaces in
"Texture").
"
GL_GENERATE_MIPMAP_SGIS is very slow (over half a second for a 720*576
texture). However, glGenerateMipmapEXT() performs well (16ms for the
same texture), so I have modified the attached files to use
Texture::generateMipmap() if glGenerateMipmapEXT is supported, instead
of enabling & disabling GL_GENERATE_MIPMAP_SGIS."
Notes, from Robert Osfield, I've tested the out of the previous path using
GL_GENERATE_MIPMAP_SGIS and non power of two textures on NVidia 7800GT and
Nvidia linux drivers with the image size 720x576 and only get compile times
of 56ms, so the above half second speed looks to be a driver bug. With
Muchael's changes the cost goes done to less than 5ms, so it's certainly
an effective change, even given that Michael's poor expereiences with
GL_GENERATE_MIP_SGIS do look to be a driver bug.
- Solves issues of loading image data into the texture memory
- Print a warning if images are of different dimensions or have different internal formats (GL specification requires images to be the same)
Patch is tested and seems to work fine. It shouldn't break any other functionality. It should go into include/osg and src/osg
"
multi-threaded paging, where the Pager manages threads of reading local
and http files via seperate threads. This makes it possible to smoothly
browse large databases where parts of the data are locally cached while
others are on a remote server. Previously with this type of dataset
the pager would stall all paging while http requests were being served,
even when parts of the models are still loadable virtue of being in the
local cache.
Also as part of the refactoring the DatabaseRequest are now stored in the
ProxyNode/PagedLOD nodes to facilitate quite updating in the cull traversal,
with the new code avoiding mutex locks and searches. Previous on big
databases the overhead involved in make database requests could accumulate
to a point where it'd cause the cull traversal to break frame. The overhead
now is negligable.
Finally OSG_FILE_CACHE support has been moved from the curl plugin into
the DatabasePager. Eventually this functionality will be moved out into
osgDB for more general usage.
From Robert Osfield, refactored the FrameBufferObejcts::_drawBuffers set up so that its done
within the setAttachment method to avoid potential threading/execution order issues.
Introduced code in BoundgingSphere, BoundingBox, ProxyNode and LOD to utilise the above settings.
Added Matrix::value_type, Plane::value_type, BoundingSphere::value_type and BoundingBox::value_type command line
options that report where the types of floats or doubles.
and a new scheme for computing the scaling when using autoscale that introduces smooth
transitions to the scaling of the subgraph so that it looks more natural.
The win32 pbuffer implementation returned an error unless both the
WGL_ARB_pbuffer and the WGL_ARB_render_texture functions were present.
This was too restrictive, as a pbuffer can usefully be created without
render-to-texture, e.g. for use with glReadPixels. The osg 1.2/Producer
pbuffers worked without RTT, and osgUtil::RenderStage has all the code to
handle both RTT and non-RTT pbuffers, doing a read and copy in the
latter case.
With these changes I have successfully tested the osgprerender example
on a graphics card which supports RTT, and one which doesn't. Plus
tested in my own application.
In order to aid diagnostics I have also added more function status
return checks, and associated error messages. I have included the win32
error text in all error messages output. And there were some errors
with multi-threaded handling of "bind to texture" and a temporary window
context which I have corrected.
These is one (pre-existing) problem with multi-threaded use of pbuffers
in osgViewer & osgprerender, which I have not been able to fix. A win32
device context (HDC) can only be destroyed from the thread that created
it. The pbuffers for pre-render cameras are created in
osgUtil::RenderStage::runCameraSetUp, from the draw thread. But
closeImplementation is normally invoked from the destructor in the main
application thread. With the additional error messages I have added,
osgprerender will now output a couple of warnings from
osgViewer::PixelBufferWin32::closeImplementation() at exit, after
running multi-threaded on windows. I think that is a good thing, to
highlight the problem. I looked into fixing it in osgViewer::Renderer &
osgUtil::RenderStage, but it was too involved for me. My own
application requirements are only single-threaded.
Unrelated fix - an uninitialised variable in
osg::GraphicsThread::FlushDeletedGLObjectsOperation().
"
there is a bug. The header file do specify something
like this:
FrameBufferAttachment(Texture3D* target, int zoffset,
int level = 0);
However in the .cpp file we have:
FrameBufferAttachment::FrameBufferAttachment(Texture3D*
target, int level, int zoffset)
Which means that the meaning of level and zoffset is
interchanged.
The file with the corrected line is attached. Should
go into src/osg/
"
pbuffer functions or exactly ask for the extensions we need to call the
apropriate glx extension functions for and around pbuffers extensions.
The glx 1.3 version of this functios are prefered. If this is not pressent we
are looking for the glx extensions and check for them.
Prevously we just used some mix of the glx 1.3 functions or the extension
functions without making sure that this extension is present.
"