Thanks!

Good to see the points connected (nearly! ). I don't know how easy it is to do, but currently the lines are 'thickly' connected:

Code: Select all

```
#
#
##
#
#
```

Code: Select all

```
#
#
#
#
#
```

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Thanks!

Good to see the points connected (nearly! ). I don't know how easy it is to do, but currently the lines are 'thickly' connected:

Code: Select all

```
#
#
##
#
#
```

Code: Select all

```
#
#
#
#
#
```

Hi, yes that change was possible. The updated code is here, and now I've patched up the top and bottom missing pixels.

I think the above fix to avoid "thickly" drawn lines might be wasteful by occasionally plotting two pixels on top of each other. To optimise it, I think we only need to start plotting points when travelling sideways is if we travel sideways by two pixels or more. Then we only need to plot the number of pixels travelled sideways minus 1. I can't think of an elegant way to do that.

Plus, there's still some nasty edge cases that need patching up somehow, e.g.this bug where the increment of X1B or X2B leaks out and goes into an infinite loop. That might need trapping somehow, or the algorithm fixing.

I think the "half pixel" logic which can be used to make Bresenham algorithms draw nicer lines/circles (i.e. when drawing a straight line from (x1,y1) to (x2,y2) should we really calculate from (x1+0.5,y1+0.5) to (x2+0.5,y2+0.5) to get a nicer effect?) needs checking. I've just guessed and hacked at that currently!

Edit: another issue of concern is the expression DD%=AA%*(BB%-YSQUARED%), where AA%=A%*A%. So here we have a power of 4 of the original quantities, A%*A%*B%*B%. So if the original quantities are 8 bits wide, then we need 32 bits just to store DD%. But if the original quantities are 32 bits wide, (e.g. on higher-resolution systems) then we would need 128 bits to hold DD%. But maybe that is a price worth paying as potentially this integer-only implementation is potentially precise, i.e. potentially has no rounding errors or approximations ever.

Mike

I think the above fix to avoid "thickly" drawn lines might be wasteful by occasionally plotting two pixels on top of each other. To optimise it, I think we only need to start plotting points when travelling sideways is if we travel sideways by two pixels or more. Then we only need to plot the number of pixels travelled sideways minus 1. I can't think of an elegant way to do that.

Plus, there's still some nasty edge cases that need patching up somehow, e.g.this bug where the increment of X1B or X2B leaks out and goes into an infinite loop. That might need trapping somehow, or the algorithm fixing.

I think the "half pixel" logic which can be used to make Bresenham algorithms draw nicer lines/circles (i.e. when drawing a straight line from (x1,y1) to (x2,y2) should we really calculate from (x1+0.5,y1+0.5) to (x2+0.5,y2+0.5) to get a nicer effect?) needs checking. I've just guessed and hacked at that currently!

Edit: another issue of concern is the expression DD%=AA%*(BB%-YSQUARED%), where AA%=A%*A%. So here we have a power of 4 of the original quantities, A%*A%*B%*B%. So if the original quantities are 8 bits wide, then we need 32 bits just to store DD%. But if the original quantities are 32 bits wide, (e.g. on higher-resolution systems) then we would need 128 bits to hold DD%. But maybe that is a price worth paying as potentially this integer-only implementation is potentially precise, i.e. potentially has no rounding errors or approximations ever.

Mike

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

This draws horizontal lines to make the ellipse thinly connected, only drawing each pixel once:

here

here

Great, and thank you, Toby.

Next, I'm worried about how to solve this bug (which you first noticed) where the iteration to solve X1B and X2B never terminates, when we have very narrow (A%=1) tall slanted ellipses. I'm not sure how to fix that.

I spent a couple of hours looking at this bug yesterday but couldn't crack it. I think it's something to do with this equation which I "squared both sides of" has now doubled the number of solutions (since squaring equations generally does that doubling), i.e. we now have a quadratic equation for X1B and another quadratic equation for X2B.

When before I casually said "Then we do a trial calculation of D1. If this is positive, then X1 is too large and we need to reduce it by 1", I think "reducing X1B by 1" is jumping right over a local extremum of the quadratic described by D1 (in the solution space we are searching in, to solve the equation D1=0), and then starts climbing up the other side of that parabola of that quadratic. On that side of the quadratic, D1 just keeps on increasing, so we always find D1>0, and so we just always keep reducing X1B by 1 - hence the infinite loop ending in a "Too Big" error.

A quick hack fix is to check we always ensure X2B<X1B but then that doesn't seem to follow the exact true slant of the sheared ellipse - it misses the top point of the ellipse by a couple of pixels.

I'll come back to this in a few days, unless someone else can kindly crack it?

Next, I'm worried about how to solve this bug (which you first noticed) where the iteration to solve X1B and X2B never terminates, when we have very narrow (A%=1) tall slanted ellipses. I'm not sure how to fix that.

I spent a couple of hours looking at this bug yesterday but couldn't crack it. I think it's something to do with this equation which I "squared both sides of" has now doubled the number of solutions (since squaring equations generally does that doubling), i.e. we now have a quadratic equation for X1B and another quadratic equation for X2B.

When before I casually said "Then we do a trial calculation of D1. If this is positive, then X1 is too large and we need to reduce it by 1", I think "reducing X1B by 1" is jumping right over a local extremum of the quadratic described by D1 (in the solution space we are searching in, to solve the equation D1=0), and then starts climbing up the other side of that parabola of that quadratic. On that side of the quadratic, D1 just keeps on increasing, so we always find D1>0, and so we just always keep reducing X1B by 1 - hence the infinite loop ending in a "Too Big" error.

A quick hack fix is to check we always ensure X2B<X1B but then that doesn't seem to follow the exact true slant of the sheared ellipse - it misses the top point of the ellipse by a couple of pixels.

I'll come back to this in a few days, unless someone else can kindly crack it?

Is not EOR compatible.

Code: Select all

```
10 REM Bresenhams Filled Circle Drawing Algorithm
20 MODE1
30 VDU29,640;512;
40
50 C%=3
60 TIME=0
70 FOR R%=504 TO 0 STEP -24
80 GCOL0,C%:IF C%=0 THEN C%=3 ELSE C%=C%-1
90 PROCfilledcircle
100 NEXT
110 PRINTTIME
120 END
130
140 DEFPROCfilledcircle
150 X%=0:Y%=R%
160 D%=3-R%*2
170 REPEAT
180 MOVEY%,X%:DRAW-Y%,X%
190 MOVEY%,-X%:DRAW-Y%,-X%
200 IF D%<0 THEN X%=X%+4:D%=D%+6+X%*4 ELSE MOVEX%,Y%:DRAW-X%,Y%:MOVEX%,-Y%:DRAW-X%,-Y%:X%=X%+4:Y%=Y%-4:D%=D%+10+(X%-Y%)*4
210 UNTIL X%>Y%
220 ENDPROC
```

Elk rev 4 & Plus1

BBC B rev 7 & Cumana Disc Drive

BBC micro:bit

RaspberryPi B+ & 3B & Zero

Abit BP6

BBC B rev 7 & Cumana Disc Drive

BBC micro:bit

RaspberryPi B+ & 3B & Zero

Abit BP6

I've fixed the bug in the case of very narrow slanted ellipses causing an infinite loop now (e.g. A%=1, B%=40, S%=60).

I've worked this out properly now and am satisfied it covers all cases. It just needed the clauses AND T%>0 or OR T%<0 adding. The equation T%=0 is the straight line which runs up through the centre of the slanted ellipse. The AND T%>0 stops X1 crossing over this central line. The OR T%<0 stops the midpoint method crossing over to the wrong root of the quadratic.

I've also renamed some variables (X1B% is now X2% and X2B% is now X1%) as I felt I could read the code easier if X1<X2.

Code

I think the code could still be tided up, plus there's an issue about rounding to half-pixels that needs tidying up, especially around the line y=0 where a double pixel is often seen (Example)

Anyway, apart from that, this version seems to be an integer-only, and square-root free and division free version of an ellipse plotter.

Unfortunately this code still requires to handle large integers due to the power of 4 terms. So if our screen size is 256by256 pixels then we would need integers up to magnitude 256^4. If the ellipse goes off the screen (so x>256) we might overflow the BASIC integer size. Hence it might be necessary to use floats anyway (or 64-bit or 128-bit ints) for the workspace variables in this code.

If anyone can find any bugs then let me know (other than the 32-bit limit on integers being exceeded).

I've worked this out properly now and am satisfied it covers all cases. It just needed the clauses AND T%>0 or OR T%<0 adding. The equation T%=0 is the straight line which runs up through the centre of the slanted ellipse. The AND T%>0 stops X1 crossing over this central line. The OR T%<0 stops the midpoint method crossing over to the wrong root of the quadratic.

I've also renamed some variables (X1B% is now X2% and X2B% is now X1%) as I felt I could read the code easier if X1<X2.

Code

I think the code could still be tided up, plus there's an issue about rounding to half-pixels that needs tidying up, especially around the line y=0 where a double pixel is often seen (Example)

Anyway, apart from that, this version seems to be an integer-only, and square-root free and division free version of an ellipse plotter.

Unfortunately this code still requires to handle large integers due to the power of 4 terms. So if our screen size is 256by256 pixels then we would need integers up to magnitude 256^4. If the ellipse goes off the screen (so x>256) we might overflow the BASIC integer size. Hence it might be necessary to use floats anyway (or 64-bit or 128-bit ints) for the workspace variables in this code.

If anyone can find any bugs then let me know (other than the 32-bit limit on integers being exceeded).

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

This looks good! I've not found any new bugs. In MODE 0 the resolution is 320x256, so I thought that a wide ellipse on screen might overflow the BASIC integer size... but no - the A,B values are only half the full diameter of each axis of course. If A and B values can go up to 255, the full axis length (diameter) is up to 510, which is larger than the 320 screen width. So the code should work for ellipses that are a bit larger than the screen at least.

Thanks! Thanks for the big test. I suspect we'd still run into trouble if someone plots huge ellipse with its centre off screen (e.g. so that 90% of the ellipse is clipped). I think the only safe way forward might be to stick to floats, which kind of defeats the whole point of having done this? Obviously the goal would have been to convert this into assembly or C code. I was motivated to do this because I saw the sqrt in the C code of central loop of the ellipse-routine in MatrixBrandy (see draw_ellipse(...)). How many cycles does a sqrt take on a modern CPU anyway, compared to a basic add operation?

Anyway, I'll work on that central point bug next...

Another cool thing to do would be to bunch together all of the vertical line-segments, and replace them by a single vertical DRAW statement. That might save quite a bit of time. Also, all of the multiplications could be replaced by cumulative additions - but I gather modern C compiles can work out how to do that kind of optimisation for themselves anyway - which is pretty awesome.

Anyway, I'll work on that central point bug next...

Another cool thing to do would be to bunch together all of the vertical line-segments, and replace them by a single vertical DRAW statement. That might save quite a bit of time. Also, all of the multiplications could be replaced by cumulative additions - but I gather modern C compiles can work out how to do that kind of optimisation for themselves anyway - which is pretty awesome.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

The centre point can be anywhere, that only affects the VDU 29,640;512; at the start. But very large ellipses would be a problem yes. floats on modern CPUs are pretty fast, and SQRT is often a single instruction I believe. But this routine would be useful for implementing a 6502 assembly version.

Ok Toby, let's aim for a 6502 implementation. We need to 1. Fix that bug (I will do). 2. Then optimise the code by replacing multiplications by adds (I can do). 3. Optimise drawing of vertical line segments into single DRAW statements (any ideas on that? Should be do-able). 4. Then we can go for a 6502 implementation (I assume you can do that?!). I'll get back to you with that bug-fix asap, please hold on - I'm busy at work at the moment.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

OK, I'll have a go at a 6502 implementation.mike12f wrote: ↑Sun Nov 21, 2021 6:32 pmOk Toby, let's aim for a 6502 implementation. We need to 1. Fix that bug (I will do). 2. Then optimise the code by replacing multiplications by adds (I can do). 3. Optimise drawing of vertical line segments into single DRAW statements (any ideas on that? Should be do-able). 4. Then we can go for a 6502 implementation (I assume you can do that?!). I'll get back to you with that bug-fix asap, please hold on - I'm busy at work at the moment.

Fixing that bug with the anomaly at the centre was quite hard. I had to go back to the original floating point square-root method (which was simpler to analyse) to come up with a fix (with floating point sqrt). Hopefully that version is easier to understand than the integer implementation. There's a lot of cases to consider in there - if the initial gradient's magnitude is less than 1 then we should skip the middle line of pixels (around y=0) when reflecting from top to bottom. Also the lower half needs shifting left or right one pixel depending on the sign of the shear S%. If the initial gradient's magnitude is greater than one then we don't skip the middle row of pixels and don't do any shifting. Quite strange. If you notice any combinations of gradients that cause problems then let me know.

To check the pixels were roughly in the right place, it was useful to go back to this version for comparison of what we are aiming for, which simply avoids doing any reflection (but is twice as computationally expensive) for comparison.

Anyway, the integer implementation with the fix for this anomaly is here.

I'm working on the grouping vertical lines of pixels next. Nearly there.

To check the pixels were roughly in the right place, it was useful to go back to this version for comparison of what we are aiming for, which simply avoids doing any reflection (but is twice as computationally expensive) for comparison.

Anyway, the integer implementation with the fix for this anomaly is here.

I'm working on the grouping vertical lines of pixels next. Nearly there.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Looks good. I'm making progress on the 6502 version.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

It's good that we have a working version that doesn't optimise for vertical lines, as this can be adjusted to implement a filled ellipse.

Hi Toby,

I made a lot of fuss over that central pixel issue. I've found bugs in BBC basic's ellipse, because they don't have that issue sorted out fully themselves. See here for an example. Also you get similar issues in MatrixBrandy Basic, which followed the code developed at the top of this thread. I think this shows a small artifact error at the middle of the line:
Anyway, it seems all implementations are struggling with this minor pixel issue at the centre of the ellipse caused by reflecting the top of the ellipse down to the bottom, to save having to make the expensive computation for the bottom half of the ellipse. Basically, if the shear gradient is shallow (<1 in magnitude) then you have to skip the middle row of pixels when you do the reflection, otherwise you should not skip that middle row. I don't think the other implementations are doing that. BBC basic never skips the middle row (hence weird results when drawing very shallow gradients), and MatrixBrandy Basic always skips it (hence weird results when drawing very steep gradients).

Thanks for working on the 6502 version. Do you need help getting rid of the multiplications? It should be possible to replace all of them by adds. I'm happy to skip the vertical line drawing. But I was assuming there the overhead to call the DRAW routine is quite high for each call, and I was thinking we should avoid that overhead if possible.

It still seems annoying that we need 4 bytes to hold coordinates for the sake of calculating final coordinates that are really only one byte wide. I have been thinking about whether things can be optimised so you only need 2 bytes. E.g. instead of solving for X1 and X2 fully, we could solve for just the square root term D=A/B*SQRT(B*B-Y*Y) using the same iterative method, and just add in the x_shear coordinate (where x_shear=(X1+X2)/2), to calculate X1 and X2. This might mean you only need 16 bits for each integer. But we couldn't truncate D down to an integer- we'd have to use a few fixed-point bits of decimal fractions in there, so when x_shear+D is calculated then you don't get random pixel horizontal wobble. But then we might be heading back to 32 bits in total, defeating the point of the "improvement"! Anyway, I'm still thinking about this, so just letting you know before you spend too much time on 6502 right now. Don't want you to get there and find it's slower than the original version. Maybe you can answer this question of whether it's worth considering or not?

I made a lot of fuss over that central pixel issue. I've found bugs in BBC basic's ellipse, because they don't have that issue sorted out fully themselves. See here for an example. Also you get similar issues in MatrixBrandy Basic, which followed the code developed at the top of this thread. I think this shows a small artifact error at the middle of the line:

Code: Select all

```
10MODE 4
20VDU 29,640;512;
30A%=0
40B%=15
50S%=3
60MOVE 0,0
70MOVE A%*4,0
80PLOT &C5,S%*4,B%*4
```

Thanks for working on the 6502 version. Do you need help getting rid of the multiplications? It should be possible to replace all of them by adds. I'm happy to skip the vertical line drawing. But I was assuming there the overhead to call the DRAW routine is quite high for each call, and I was thinking we should avoid that overhead if possible.

It still seems annoying that we need 4 bytes to hold coordinates for the sake of calculating final coordinates that are really only one byte wide. I have been thinking about whether things can be optimised so you only need 2 bytes. E.g. instead of solving for X1 and X2 fully, we could solve for just the square root term D=A/B*SQRT(B*B-Y*Y) using the same iterative method, and just add in the x_shear coordinate (where x_shear=(X1+X2)/2), to calculate X1 and X2. This might mean you only need 16 bits for each integer. But we couldn't truncate D down to an integer- we'd have to use a few fixed-point bits of decimal fractions in there, so when x_shear+D is calculated then you don't get random pixel horizontal wobble. But then we might be heading back to 32 bits in total, defeating the point of the "improvement"! Anyway, I'm still thinking about this, so just letting you know before you spend too much time on 6502 right now. Don't want you to get there and find it's slower than the original version. Maybe you can answer this question of whether it's worth considering or not?

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Hi Mike,

It's interesting that other implementations don't correct the centre pixel issue. The performance overhead for the fix doesn't seem too crazy. The other complication/tradeoff is whether to allow odd diameter sizes. e.g. you can draw a circle of diameter 10 pixels with this ellipse routine (A=B=5), but not diameter 9 pixels. Whether this is worth fixing is debatable.

I have an initial version of the 6502 code (without your centre point fix yet) here. The ellipse.ssd file draws an example ellipse. The asm directory has the source assembly.

Alternatively, the fastest method of all is to write to screen memory directly, bypassing the OS calls.

It's interesting that other implementations don't correct the centre pixel issue. The performance overhead for the fix doesn't seem too crazy. The other complication/tradeoff is whether to allow odd diameter sizes. e.g. you can draw a circle of diameter 10 pixels with this ellipse routine (A=B=5), but not diameter 9 pixels. Whether this is worth fixing is debatable.

I have an initial version of the 6502 code (without your centre point fix yet) here. The ellipse.ssd file draws an example ellipse. The asm directory has the source assembly.

I got rid of one or two (see ellipse_out.txt), but if you can get rid of more that would certainly help performance. 32 bit multiplications are expensive on a 6502.

Yes vertical line draws could still be a performance win. The OS calls for plotting take 25% of the execution time at the moment, which will become a higher percentage if we can remove more multiplications. So reducing the number of calls would help performance.

Alternatively, the fastest method of all is to write to screen memory directly, bypassing the OS calls.

It is worth some thought, but I have a sneaking (very non-scientific) suspicion 32 bits will be required somewhere.mike12f wrote: ↑Thu Nov 25, 2021 11:46 amIt still seems annoying that we need 4 bytes to hold coordinates for the sake of calculating final coordinates that are really only one byte wide. I have been thinking about whether things can be optimised so you only need 2 bytes. E.g. instead of solving for X1 and X2 fully, we could solve for just the square root term D=A/B*SQRT(B*B-Y*Y) using the same iterative method, and just add in the x_shear coordinate (where x_shear=(X1+X2)/2), to calculate X1 and X2. This might mean you only need 16 bits for each integer. But we couldn't truncate D down to an integer- we'd have to use a few fixed-point bits of decimal fractions in there, so when x_shear+D is calculated then you don't get random pixel horizontal wobble. But then we might be heading back to 32 bits in total, defeating the point of the "improvement"! Anyway, I'm still thinking about this, so just letting you know before you spend too much time on 6502 right now. Don't want you to get there and find it's slower than the original version. Maybe you can answer this question of whether it's worth considering or not?

The bug will be in the BBC MOS, as BASIC itself has no input in the way the ellipse is drawn. RISC OS's implementation will likely have been based on the BBC MOS version, and the version in this thread and in Matrix Brandy is an ARM-to-C translation of the RISC OS implementation.mike12f wrote: ↑Thu Nov 25, 2021 11:46 amHi Toby,

I made a lot of fuss over that central pixel issue. I've found bugs in BBC basic's ellipse, because they don't have that issue sorted out fully themselves. See here for an example. Also you get similar issues in MatrixBrandy Basic, which followed the code developed at the top of this thread.

Matrix Brandy BASIC VI (work in progress)

- Richard Russell
**Posts:**2621**Joined:**Sun Feb 27, 2011 10:35 am**Location:**Downham Market, Norfolk-
**Contact:**

I worry about prioritising the

I am suffering from 'cognitive decline' and depression. If you have a comment about the style or tone of this message please report it to the moderators by clicking the exclamation mark icon, rather than complaining on the public forum.

Toby, here is a version with no multiplications required in the central loop! Code. It's modified from your github basic program so if you apply the changes and diff them you'll see the edits I've made. Now in the central iterative loops that update X1 and X2, we have five or six 32-bit adds/subtracts, which I guess is more efficient than squaring a 32bit number to calculate T%*T%? I've tried to name the variables consistently, so TT1% is short for T1%*T1%.

Yes I agree this might be hard to avoid 32 bit manipulations. We could get the central line around x_shear calculated with one fixed-point divide instead of doing any Bresenham-style iterations though, which would mean we could then deduce X1 from X2 without any further iterations - potentially halving the algorithm time.

I couldn't get the SSD image from your github to run under jsbeeb. Is it up-to date? When I ran *EXEC !BOOT it just hung.

Richard and Soruk - I'm not sure what to do about that anomaly, it's strange to me. Did people already know about this trade-off? I suspect keeping the size of the image consistent may not be so important because apparently RiscOS misses out a central row but BBC-Master MOS does not, so they didn't consider keeping the sizes consistent ultimately important when they made that switch. Soruk we need to test my anomalous example under RPCEmu and see how that does it.

Edit: Just updated the code a little to remove some more multiplications.

Yes I agree this might be hard to avoid 32 bit manipulations. We could get the central line around x_shear calculated with one fixed-point divide instead of doing any Bresenham-style iterations though, which would mean we could then deduce X1 from X2 without any further iterations - potentially halving the algorithm time.

I couldn't get the SSD image from your github to run under jsbeeb. Is it up-to date? When I ran *EXEC !BOOT it just hung.

Richard and Soruk - I'm not sure what to do about that anomaly, it's strange to me. Did people already know about this trade-off? I suspect keeping the size of the image consistent may not be so important because apparently RiscOS misses out a central row but BBC-Master MOS does not, so they didn't consider keeping the sizes consistent ultimately important when they made that switch. Soruk we need to test my anomalous example under RPCEmu and see how that does it.

Edit: Just updated the code a little to remove some more multiplications.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Works now. I think I was probably using JSBEEB wrong previously. Well done on getting it working in 6502 so quickly. Do you think the adds will be easy to implement and produce a decent speed up? Don't bother adding that centre-point fix yet, it seems currently controversial and may not be needed.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Shouldn't be too hard, working on it now. I'm hopeful for a good speedup.mike12f wrote: ↑Thu Nov 25, 2021 7:47 pmWorks now. I think I was probably using JSBEEB wrong previously. Well done on getting it working in 6502 so quickly. Do you think the adds will be easy to implement and produce a decent speed up? Don't bother adding that centre-point fix yet, it seems currently controversial and may not be needed.

Toby, I've removed one more pair of variables, and possibly a multiplication too. Sorry for not spotting this earlier. New version

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

No problem - I've updated the 6502 version, and it is three times faster than the previous 6502 version, and about 10 times faster than the BASIC version. Which is a very nice improvement. Now calling the OS to draw lines is 55% of the total runtime.

jsbeeb link here

Great! That works well and is fast. Here is the BBC basic version of the same ellipse for comparison. However I can't see that it's 10 times slower, are you sure it is?

I guess there are more optimisation we could do. A simple one might be the last line, i.e. when Y%=B% in the main for loop. We could change the main for loop to run from 0 to B%-1, and then the last line doesn't need an iteration to solve. That line would have been the most expensive iteration too, because the last line is always the longest horizontal one, so the sqrt iteration will just be doing lots of X2=X2-1 for no reason. Then we could get rid of that final plot69 and merge it into the final horizontal DRAW, possibly from OX1+1 to OX2-1. A version showing that is here (Edit, now with DX=SGN(X2-X1) fix added).

Are you going to bother doing screen-memory poking? That would speed it up probably (I expect each call to DRAW as it is costs 1 multiplication and would could avoid that if we did screen memory poking), but is it worth it, as would you have to deal with different screen modes separately/shadow ram worries, etc?? Any idea whether the MOS ellipse routine would be doing direct screen-memory updating? Is it worth it going further? We've proven the concept now! Nice.

I've seen the assembler code and it looks neat. What tool is used to compile that? I only know the built-in Basic assembler.

I guess there are more optimisation we could do. A simple one might be the last line, i.e. when Y%=B% in the main for loop. We could change the main for loop to run from 0 to B%-1, and then the last line doesn't need an iteration to solve. That line would have been the most expensive iteration too, because the last line is always the longest horizontal one, so the sqrt iteration will just be doing lots of X2=X2-1 for no reason. Then we could get rid of that final plot69 and merge it into the final horizontal DRAW, possibly from OX1+1 to OX2-1. A version showing that is here (Edit, now with DX=SGN(X2-X1) fix added).

Are you going to bother doing screen-memory poking? That would speed it up probably (I expect each call to DRAW as it is costs 1 multiplication and would could avoid that if we did screen memory poking), but is it worth it, as would you have to deal with different screen modes separately/shadow ram worries, etc?? Any idea whether the MOS ellipse routine would be doing direct screen-memory updating? Is it worth it going further? We've proven the concept now! Nice.

I've seen the assembler code and it looks neat. What tool is used to compile that? I only know the built-in Basic assembler.

Another issue with screen poking would mean it either won't work across the Tube or if using the right OSWORD calls you're negating the speed advantage of bypassing the OS.mike12f wrote: ↑Fri Nov 26, 2021 11:17 amAre you going to bother doing screen-memory poking? That would speed it up probably (I expect each call to DRAW as it is costs 1 multiplication and would could avoid that if we did screen memory poking), but is it worth it, as would you have to deal with different screen modes separately/shadow ram worries, etc??

Matrix Brandy BASIC VI (work in progress)

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Ah - I meant compared with your BASIC code, not the ellipse PLOT code built in to the Master OS / the version in the Graphics Extension ROM (GXR) for the BBC Micro. We are still a little bit slower than these versions.

I'll take a look.mike12f wrote: ↑Fri Nov 26, 2021 11:17 amI guess there are more optimisation we could do. A simple one might be the last line, i.e. when Y%=B% in the main for loop. We could change the main for loop to run from 0 to B%-1, and then the last line doesn't need an iteration to solve. That line would have been the most expensive iteration too, because the last line is always the longest horizontal one, so the sqrt iteration will just be doing lots of X2=X2-1 for no reason. Then we could get rid of that final plot69 and merge it into the final horizontal DRAW, possibly from OX1+1 to OX2-1.

There's something nice about having it "OS legal" I suppose.mike12f wrote: ↑Fri Nov 26, 2021 11:17 amAre you going to bother doing screen-memory poking? That would speed it up probably (I expect each call to DRAW as it is costs 1 multiplication and would could avoid that if we did screen memory poking), but is it worth it, as would you have to deal with different screen modes separately/shadow ram worries, etc?? Any idea whether the MOS ellipse routine would be doing direct screen-memory updating? Is it worth it going further? We've proven the concept now! Nice.

I use the acme assembler. This creates a binary file that I then add to the final SSD using the python script image.py.

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

I updated with your 'final line' optimisation. here again

I'm thinking we can reduce the number of calls to the OS further by identifying straight lines in one of eight directions. If DX,DY is the change in X,Y from one point to the next then we only need to draw a line when either DX or DY changes. If DX and DY stays the same then the line extends further in the same direction. This will catch horizontal, vertical and 45 degree lines. I'll have a think about that.

I'm thinking we can reduce the number of calls to the OS further by identifying straight lines in one of eight directions. If DX,DY is the change in X,Y from one point to the next then we only need to draw a line when either DX or DY changes. If DX and DY stays the same then the line extends further in the same direction. This will catch horizontal, vertical and 45 degree lines. I'll have a think about that.

So how is the MOS version doing it so fast? I thought we'd beat it. If it's not having the overhead of making repeated completely new DRAW statements then that could explain it?

I thought about that 45degree method and couldn't work out how to do it elegantly. I think it would be so much easier if we had a pointer to screen memory - moving up one pixel is just a simple 16-bit add statement then.

(If we were going for brute force speed, while still calling the DRAW functions, then I don't think the human eye can notice if the ellipse is approximated by quite long line segments, so we could say simply "y_step_size=max(1,int(ellipse_height/20))" to easily speed things up on large ellipses, with very little coding effort; but I guess that's cheating. Apparently the Elite circle routine uses relatively few straight-line segments, so in practice, if we just want fast curves, you can probably get away with this.)

I thought about that 45degree method and couldn't work out how to do it elegantly. I think it would be so much easier if we had a pointer to screen memory - moving up one pixel is just a simple 16-bit add statement then.

(If we were going for brute force speed, while still calling the DRAW functions, then I don't think the human eye can notice if the ellipse is approximated by quite long line segments, so we could say simply "y_step_size=max(1,int(ellipse_height/20))" to easily speed things up on large ellipses, with very little coding effort; but I guess that's cheating. Apparently the Elite circle routine uses relatively few straight-line segments, so in practice, if we just want fast curves, you can probably get away with this.)

- TobyLobster
**Posts:**201**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

Exactly so. We have to call the OS 6 times for each move or draw command, which is a big overhead. The OS can call internal routines without that overhead.

Yes and no. The screen memory of a BBC Micro/Master isn't a nicely arranged linear address space but is segmented into character rows. To go up one pixel you decrement the address by one, but every eighth pixel you have to subtract a large amount to get to the previous character row. It make all direct writing to the screen more complex. Moving left and right isn't easy either. Also clipping to the current graphics window is an issue to deal with, as is GCOL modes for how to render each pixel.

Yes, I think that's cheating . I think it does look better when pixel accurate.