• End of the (cache) tunnel

    At least ! After months of work, my cache is finally up and working ! I spent the last weeks trying to understand why some pages weren’t properly reloaded. After adding a cell viewer to the debugger, it dawned on me that I wasn’t detecting any color ram change, hence the cache trouble I was experiencing …

    Now everything works fine, and I can move on something else. I will come back to it later, as there’s a lot of room for improvement, but currently I need to work on something else for a change 🙂

    What’s next :
    – understanding why the cdblock gets full while playing some videos, and doesn’t clears itself
    – start working on the DSP, as it’s used in quite a lot of games
    – adding the rotating backgrounds (yeah, some mode vdp2 :/)
    – adding the line / cell scroll (used by some Capcom games)

    I also have to understand why some games (like Radiant Silvergun or Metal Slug) run that slow ingame … it’s not a display problem, as when it’s disabled the speed stays the same, but it’s annoying.

    Well, that’s all for now 🙂


  • Something fishy with onchip memory

    As I was converting different VDP2 modes to the new cache, I came across testing the game 3 Dirty Dwarves. This game got a strange problem since version 0.32, as main player characters weren’t being diplayed on screen : you could move around, see bullets fired, or punch the villains, but that was it. You could only guess your position by the scrolling moves.

    At first I thought it was related to the DSP, as some games use it to do some sprite calculations, but after a quick check that wasn’t the case.

    Some more tracing lead me to some unmapped memory access (which wasn’t logged for some reason), and reading through the SH2 hardware manual gave me the aswers I was looking for : the SH2 has an internal cache (4KB of data, and 1KB of addresses), which can be configured to be used as a 4 ways cache (all the data cache is used as instructions / data cache), a 2 ways cache (the first half is used as a high speed RAM, and the other one as regular cache), or completely disabled (4KB used as high speed RAM).

    After adding this 4KB area to the memory map of the emulator, bingo ! The sprites are displayed 😀

    Check the screenshots from current dev version :

    I’ll try to add later some code to mimic the cache behaviour …


  • Gaining speed

    VDP2 cache debugging wasn’t that easy : I had to redesign the way threads were handled in order to get it working, as you can’t share a rendering context between OpenGL and Windows. I decided to use Boost for those too, as I don’t need something complicated. Now it works the way I wanted, ie you can display backgrounds by priority.

    Bitmaps and cell mode now work using the cache, I used similar functions than for the VDP1 in order to handle cached textures.

    I also used the block transfer mode from OpenGL, in order to speed up transfers to the graphic card memory. The results are interesting, as during the logo assembly of the bios the speed is over 70 fps, and around 45 fps (with bitmaps enabled) ) in the cd player.

    I still have to convert all the display modes from the VDP2 to the cache system, and when it’s done I’ll move on (there’s still plenty of stuff to do :p )


  • Happy New Year !

    May 2009 be a great year to everyone 🙂

    There’s not a lot of news regarding Saturnin, as last month and a half corresponded to a peak of work into my real job. So my free time was drastically reduced during this period, and I couldn’t work on Saturnin as much as I wanted …

    But that didn’t mean that I didn’t do anything : I created a local subversion server and put the source code on it, as I never gave up the idea to move the project to open source (don’t expect anything soon though :p)
    I also started using a task manager to handle all the notes lying around and the todo list (it’s Task Coach for those interested), with the hope to have everything a bit more formalized.

    On the code itself, I did some rearrangement to clean everything up, to put all the related functions into the same files, etc. …
    And now I’m back on adding the bitmaps to the rendering engine, but there are some difficulties that can’t be solved right now, forcing me to create a more advanced VDP2 texture debugger.
    So I’m 100% on that right now.


  • VDP2 cache progress

    The cache is finally working for the VDP2 cell mode (the one used in the bios). I had some transparency problem due to textures not being reloaded when the transparency bit of the screen was changed, but that’s now corrected.
    Performances are quite interesting in the current cache state, ie around 50 fps when the “Sega Saturn” text is displayed, and between 30 and 40 fps inside the cd player.

    You might say it’s not that fast, but actually as the whole page is displayed (ie 512*512 pixels) instead of just the visible part (ie 320*224 in that case), a lot of extra calculation is done, which slows down the display. Of course I’ll change that in the future, but right now it’s already faster than it was in the previous release, so that’s a good start 🙂

    Now that’s the cache is running for cell mode, I’m now in the process of integrating bitmap mode to it … I took the opportunity to redesign the way both modes are setup, and to put it in different functions.
    My goal is to have the whole bitmap to be added as a single texture to the texture map. That implies to modify the texture class structure, as until now each VDP2 texture was considered to be 8*8.

    Now the bitmap texture will just be considered as a big cell (up to 1024*512), and displayed like the others.

    So what are the next steps ?

    • add bitmaps to the VDP2 cache
    • convert the other display modes to use the cache
    • test the cache speed, and improve it

    That’s all for now (but that’s a lot, believe me :p)


  • New coordinate system

    After spending some time toying with OpenGL coordinate system (OCS), I was able to get the same one than the Saturn’s. That’s really interesting as up to now I was using the original OCS (ie with values between -1.0 and 1.0), which implied to do all sort of conversions to get one dot in the Saturn coordinate system (SCS) converted into the OpenGL one. For instance, the dot [50,50] in the SCS had to be converted into something like [0.215, 0.1483] in the OCS. Now it’s one for one, so no more conversions are needed.

    I never dug into that aspect until now, as everything worked fine. But  with the cache system I want to setup for the VDP2, that could have become a major drawback.

    So less calculations automatically lend to more speed, but in the other hand the video card has to support the same viewport size than the Saturn’s, which is 2048*2048. I’ve sent a version to beta testers to check what’s the maximum viewport supported by various graphic cards, and so far none has failed (even an old S3 from 2001 was able to do so …)
    So I have reached the decision to drop support for graphic card not supporting 2048*2048 viewport for the current version. If there’s a demand I’ll try to do something for older cards, but you’ll have to be convincing 😉

    Yesterday I finished modifying the VDP1 to take into account the new coordinate system and everything went smoothly. I even added the VDP2 planes 🙂 Now the bios is almost back to the way it should look, minus the scrolling (which should be taken care of quickly).

    Next step is to add the cache detection, as it’s reloaded every frame right now. When it’ll be done, I’ll have a better view on the performances of the cache …


  • VDP2 texturing

    Back from holidays !
    I’m making some progress on the VDP2 : 512*512 textures are now correctly filled with data.
    Now I have to add it to the rendering engine (with the VDP1).
    After that, depending on how good the perfs are, I’ll extend the code to the whole VDP2 (currently just the code used in the bios is changed)
    Stay tuned, there’s more to come


  • VDP1 updated

    I didn’t thought it would be that hard to add this feature to my vdp1 rendering system …
    Anyway, for those interested, the discussion regarding this matter started on this page.

    Now for the good stuff : backgrounds aren’t plugged in as I haven’t finished my VDP2 cache yet, but the sprites are fully functionnal.

    • Previous rendering :

    • Current rendering :

    bakubaku_ok1 bakubaku_ok2

    Quite neat heh 🙂
    Perspective correction won’t work in some particular cases (like non trapezoid quads), but it’s marginal. I’m trying to get info on how making it work in every possible quad configuration, but it’s getting really technical and mathematical, and I’m not that good at it :p

    Now I’m moving back to my VDP2 cache 😉


  • And now for something a little different

    I’ve always been worried about the way OpenGL renders the Saturn’s distorded polygons. As the Saturn doesn’t specify any Z coordinate (aka depth coordinate) when displaying a polygon, OpenGL has to approximate its value to apply a texture to it.
    When the polygon is a regular quadrangle (ie a square, rectangle, etc. ), the texture coordinates and the polygon ones are identicals, so OpenGL texture mapping is correct. In the case of a distorded quadrangle, only half of the texture coordinates are identical to the polygon coordinates, and the texture seems to be mapped on the polygon as 2 different triangles. (OpenGL always splits quadrangles into 2 triangles as modern graphic cards only work with triangles)

    Maybe a graphical example will be better to grasp the idea :

    • original texture / texture coordinates (will be used to map the texture to the polygon)

      /

    • texture coordinates + regular quadrangle coordinates (identical to the texture coordinates) = correct texture mapping on the quad

    + =

     

    • texture coordinates + distorded quadrangle coordinates(different from the texture coordinates) = incorrect texture mapping on the quad

    +  = 

    So I did some research, I wasn’t really sure that this problem could be solved without using a software renderer, but I was wrong. Using the texture projective space allows to change the way OpenGL maps the texture coordinates to the quad coordinates, rendering neatly distorded quads (I won’t enter in the details :p )

    Here is a sample. I won’t use the same example as above as I haven’t yet implemented it in the VDP1 renderer, but the following screenshots were done through a test renderer in the emu. The left one is rendered like it was done so far, and the right one using the above technique. Both use a 4*4 black and white checkerboard as texture.

    qcoord_ok


  • Slowness, the sequel

    Ok. After some more testing, I have to face it : my cache isn’t that good. When the cache is used at full capacity (ie nothing is read from the Saturn memory, everything is already in the vector and cells are just displayed to the framebuffer, I only have a 0.5 fps increase …

    So I did some more thinking.
    The cache is organized like that :

    • one map storing 8*8 pixels textures (one texture from the map can be used by one or more cells)
    • one vector storing cells (up to 4096 by page), each cell being linked to a texture in the map

    Currently the cache detects when a cell has changed in the Saturn memory, reloading it if necessary. So when the framebuffer is filled with vector data, each cell is displayed.
    Here’s the catch : this method isn’t using the graphic card memory to store texture data. So every cell displayed is loaded in memory, displayed then discarded. That costs a lot performance wise …

    So I’ve decided to do it another way :

    • the map and vector contents will stay the same as before
    • display to the framebuffer won’t be done directly : instead 512*512 pixels textures will be defined, filled with cell data, and stored into the graphic card memory. In that case a whole page (4096 cells) will be cached at one time, and reused at will.

    I need to be careful not to saturate the graphic card memory, but I expect a huge perf increase 🙂

    And as a nice bonus, I can this way handle per dot and per cell priority, without much effort 😉

    Now that’s the theory. I hope that I won’t be disapointed by the results …