Back from holidays !
I’m making some progress on the VDP2 : 512*512 textures are now correctly filled with data.
Now I have to add it to the rendering engine (with the VDP1).
After that, depending on how good the perfs are, I’ll extend the code to the whole VDP2 (currently just the code used in the bios is changed)
Stay tuned, there’s more to come
Tag: emulation
VDP1 updated
I didn’t thought it would be that hard to add this feature to my vdp1 rendering system …
Anyway, for those interested, the discussion regarding this matter started on this page.
Now for the good stuff : backgrounds aren’t plugged in as I haven’t finished my VDP2 cache yet, but the sprites are fully functionnal.
- Previous rendering :
- Current rendering :
Quite neat heh 🙂
Perspective correction won’t work in some particular cases (like non trapezoid quads), but it’s marginal. I’m trying to get info on how making it work in every possible quad configuration, but it’s getting really technical and mathematical, and I’m not that good at it :p
Now I’m moving back to my VDP2 cache 😉
And now for something a little different
I’ve always been worried about the way OpenGL renders the Saturn’s distorded polygons. As the Saturn doesn’t specify any Z coordinate (aka depth coordinate) when displaying a polygon, OpenGL has to approximate its value to apply a texture to it.
When the polygon is a regular quadrangle (ie a square, rectangle, etc. ), the texture coordinates and the polygon ones are identicals, so OpenGL texture mapping is correct. In the case of a distorded quadrangle, only half of the texture coordinates are identical to the polygon coordinates, and the texture seems to be mapped on the polygon as 2 different triangles. (OpenGL always splits quadrangles into 2 triangles as modern graphic cards only work with triangles)
Maybe a graphical example will be better to grasp the idea :
- original texture / texture coordinates (will be used to map the texture to the polygon)
- texture coordinates + regular quadrangle coordinates (identical to the texture coordinates) = correct texture mapping on the quad
- texture coordinates + distorded quadrangle coordinates(different from the texture coordinates) = incorrect texture mapping on the quad
So I did some research, I wasn’t really sure that this problem could be solved without using a software renderer, but I was wrong. Using the texture projective space allows to change the way OpenGL maps the texture coordinates to the quad coordinates, rendering neatly distorded quads (I won’t enter in the details :p )
Here is a sample. I won’t use the same example as above as I haven’t yet implemented it in the VDP1 renderer, but the following screenshots were done through a test renderer in the emu. The left one is rendered like it was done so far, and the right one using the above technique. Both use a 4*4 black and white checkerboard as texture.
Slowness, the sequel
Ok. After some more testing, I have to face it : my cache isn’t that good. When the cache is used at full capacity (ie nothing is read from the Saturn memory, everything is already in the vector and cells are just displayed to the framebuffer, I only have a 0.5 fps increase …
So I did some more thinking.
The cache is organized like that :
- one map storing 8*8 pixels textures (one texture from the map can be used by one or more cells)
- one vector storing cells (up to 4096 by page), each cell being linked to a texture in the map
Currently the cache detects when a cell has changed in the Saturn memory, reloading it if necessary. So when the framebuffer is filled with vector data, each cell is displayed.
Here’s the catch : this method isn’t using the graphic card memory to store texture data. So every cell displayed is loaded in memory, displayed then discarded. That costs a lot performance wise …
So I’ve decided to do it another way :
- the map and vector contents will stay the same as before
- display to the framebuffer won’t be done directly : instead 512*512 pixels textures will be defined, filled with cell data, and stored into the graphic card memory. In that case a whole page (4096 cells) will be cached at one time, and reused at will.
I need to be careful not to saturate the graphic card memory, but I expect a huge perf increase 🙂
And as a nice bonus, I can this way handle per dot and per cell priority, without much effort 😉
Now that’s the theory. I hope that I won’t be disapointed by the results …
Slowness …
My cache isn’t crashing anymore, but there’s another problem now : speed is way too slow 🙁
At the end of the Sega Saturn logo assembly in the bios, where the VPD2 is used for the first time, speed slows down to 2 fps … I’ve tried to profile the program to see where the bottleneck is, but it was of no use.
I suspect however something in the fact that I’m now filling the list with the whole VDP2 page (512*512 pixels) instead of just the display area as it was until now (320*224 in that case). That means a lot of extra calculations, I’ll do some testing tonight to see if I’m right, and what can be done if that’s the case …
Stay tuned !
VDP2 problem found
Great news ! I think I’ve found out where my problem is …
Actually I was using one vector to store texture data, and another one to store parts to be displayed. Each VDP2 background is splitted into smaller parts of 8*8 pixels, each of them having a texture linked to it.
But this link was done pointing to the texture data from outside the vector, instead of inside, meaning that a texture value which was correct at part creation wasn’t anymore when it was displayed, leading to a crash as pointers were invalid …
I’ve decided to use a map instead of a vector to store the list of textures, as the key to access data can be easily calculated (texture address + color mode). It needs a lot of modifications as the program wasn’t supposed to work that way, but I think that’s the right way, as now the texture will be referenced by its map key in the VDP2 part instead of a pointer to the texture …
On a side note, I remembered a mail from Fabien from early 2006 stating that he corrected a bug in the SCSP that was responsible for stopped / choppy video display … I applied his correction to Saturnin (it was about time ^^)
We’ll see later if it really changes something.
Cd block is done
The cdblock is now finished, I only had one bug left in my SPTI code which was quickly corrected after some testing.
I’m quite happy as it went smoothly, that was somewhat unexpected 🙂
Update I did some testing without the VDP2 activated, and the SPTI code works great ! Without backgrounds it’s not really interesting, but sound is working for games, meaning that SPTI is fully functionnal 🙂
Now back to the VDP2 cache. (for real :p)
Code cleaning
Not much time lately, but I’m still advancing :
- all the access method code is now removed from the cdrom class, and added to the corresponding files. I took the opportunity to get rid of a bunch of unused code,
- C code used to build the file system tree is now converted to a more maintainable C++ / STL code.
The only thing left to do is to create the ReadTOC function in SPTI. It won’t be a problem, and I expect to finish it tonight. When that’s done, I’ll get back to the VDP2 cache problem 🙂
Done with the dll
The “dll compliant” code is in place 🙂
What does it mean :
- the wnaspi32.dll isn’t loaded at the start, it’s only loaded when needed (reading cdrom system id, displaying the cd drive list, etc.)
- when you choose the access method to the cd drive (ASPI or SPTI), Saturnin asks to choose the correct drive within a list. When using ASPI the SCSI address is displayed (1:0:0 for instance), while the letter drive is displayed when using SPTI (E: for instance)
- all the cd access code is now splitted into separate files, which means that a very few work is needed to switch to a full dll application. If I got a little more spare time, I would do a SPTI dll for Satourne 😉
Now that the harder part is done, let’s get to the longer one :
- creating the missing SPTI functions, not much are missing (read TOC, and a few others)
- converting the ASPI functions still in the cdrom class (same as above : read TOC and a few more)
- converting some of the cdrom functions to full C++ and STL, as they were coded by Fabien in C originally and aren’t compatible anymore with my code …
That is starting to look pretty good !
All this will need extensive testing when the cache problem will be solved 😀
Configuration saving trouble
Got the basic SPTI routines running. Now Saturnin can select a drive, save it by its letter, and read data from the cd inserted. The structures are now shared between ASPI and SPTI, and the drive letter is saved (instead of the id)
However there’s a problem : this technique cannot be used for ASPI, as there’s no way to map correctly windows drive letter to the internal SCSI configuration.
How does it work :
SPTI :
- you get a handle to a drive by its letter (D: for example)
- you use this handle to get the SCSI address of the drive (bus/target/lun)- and then you use ioctl to get the data configuration from the drive, using the SCSI address
ASPI :
- you scan the SCSI chain, testing for each address if it’s a cdrom drive or not
- if that’s the case you save the SCSI address of the current drive
- you get the logical drive list from Windows and try to map it to the SCSI address
The problem with ASPI is that you don’t have a function to map the drive letter to the SCSI address … you can get a list of logical drives, but you can only guess it’s mapping. It won’t work for people changing the order of the drive letters in Windows.
Quick example :
Windows letter | SCSI address | Position |
---|---|---|
D: (cdrom) | 0/0/0 | 0 |
E: (dvdrom) | 0/1/0 | 1 |
F: (virtual drive) | 0/2/0 | 2 |
Windows letter | SCSI address | Position |
---|---|---|
B: (virtual drive) | 0/2/0 | 0 |
D: (cdrom) | 0/0/0 | 1 |
E: (dvdrom) | 0/1/0 | 2 |
Using ASPI, as we only have the position in the SCSI chain, the letter mapping won’t be accurate …
I did look into FrogAspi, as it was supposed to be an ASPI replacement, and much to my regret it’s also based on ioctl, so it’s not usable on Windows 9X systems 🙁
Update : I’ve decided to get round this problem by saving the SCSI address in ASPI mode. The only drawback being that you won’t have a letter displayed in the drives list when you’re in ASPI mode …