Giter Site home page Giter Site logo

Comments (232)

gyurco avatar gyurco commented on July 25, 2024 3

Meanwhile I've progressed a lot on the STe GST MCU (based on the original schematics). I've wired in Till's shifter, and it produces video in simulation. You can take a look at:
https://github.com/gyurco/gstmcu

from mist-board.

gyurco avatar gyurco commented on July 25, 2024 2

Hmm probably it won't work then:
14: dout = ymreg[7][6] ? ymreg[14] : IOA_in;
But maybe changing to:
14: dout = ymreg[7][6] ? ymreg[14] & IOA_in: IOA_in;

Meanwhile I think I found the issue with memory speed: an aligned read (0 wait states) followed by a write will introduce 3 wait states. However fixing this might need to re-arrange memory slots from CPU-VIDEO-CPU2-VIDEO2 to CPU-CPU2-VIDEO-VIDEO2.

Upd.: that worked. Cycle perfection on the way!
Photo0037

from mist-board.

sebdel avatar sebdel commented on July 25, 2024 1

Hey, I just wanted to say that as someone that don't really understand these issues, I'm following this conversation with great interest. If you ever try to fix it could you push it to a branch even if it's not working ? I expect to learn a lot from this patch :)

from mist-board.

harbaum avatar harbaum commented on July 25, 2024 1

phase shift is in degree (modulo 360), here phase shift value is -2500, that's why I called it a hack : it does insert a boot delay.

Quartus allows the phase shift to be specified as an absolute time (in ns or ps) or as an angle. The -2500 are picoseconds.

And that's not a boot delay. You wouldn't notice a 2.5 nanoseconds boot delay. Instead it's the delay between the edges of the two clocks,

from mist-board.

gyurco avatar gyurco commented on July 25, 2024 1

Hopefully DMA is OK now, ACSI hard disks are working. I had a HD image with many games, they usually have some title screen added by the crackers, which were garbled with TG68K. Now they're looking good!

from mist-board.

jotego avatar jotego commented on July 25, 2024 1

Guys, just to let you know that I am following all messages with a lot of excitement. The ST core was the reason I bought MiST and got involved into all this. I'm looking forward to having an enhanced core!

from mist-board.

gyurco avatar gyurco commented on July 25, 2024 1

Just committed two IKBD fixes to the firmware, even the TG68K version could benefit from it: unpause the mouse (Defender of the Crown, R-Type), and keyboard mouse emulation (Emanuelle).

from mist-board.

gyurco avatar gyurco commented on July 25, 2024 1

I've tried to make top and bottom overscan happen "naturally". It looks nice in Demos, PacMan STE, but very scandoubler unfriendly. Yeah, I see the problem is that when you wrote it originally, the only information was available in emulator sources, and they weren't perfect at that time. Things like counter progressing, what event happens when is still a mistery, at least I didn't find exact information about them. I remember it was really a headache in Genesis, too, but there they are available externally, and could be determined by experiments. But opening the left and right borders require perfect counters. I could somewhat deduce the vertical ones from demos, but not the horizontal ones.

Something is broken with color modes, but reloading the core (several time sometimes) usually solves it. It happened right after I've switched to time-based synthesis, I really must fix this.

Thanks for testing MIDI, at least something works :)

Upd: "natural" border effects are not scandoubler unfrendly, but the sync adjuster don't like them (or vica-versa). Currently it's solved by locking the horizontal counters for a frame, but I guess it won't be good for left and right border openings.

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

Did you apply mist.sdc ?
What about Time Constraint final generated clock table result ? it's a more significative report...

from mist-board.

jotego avatar jotego commented on July 25, 2024

I do not find a nice way of pasting results here and I am not sure about which report you are referring to but I think this summarizes the situation:

Generated clocks:
clock|altpll_component|auto_generated|pll1|clk[1] spec'ed at 32MHz, but reported FMax is 22MHz!
clock|altpll_component|auto_generated|pll1|clk[2] spec'ed at 128MHz, but reported FMax is 54MHz!

As you see, the device is missing the target by quite a lot. I am using the exact setup available in github, which already applies the mist.sdc file. Sometimes the implementation seems to work on MIST, which makes me think that some of the data paths are actually false paths or multi-cycle paths and they are not relevant. But because the tool is trying to optimize false paths too, the true paths fall out of the spec's for some implementations. I do not know the MIST/Atari ST architecture well enough to start adding false/multicycle paths to the SDC file. The SDC file is now too simple. A more detailed one is needed.

I am trying to add a new device for the ST in the MIST core but synthesis fails to produce a working MIST too often. Trying to understand what happens, I found that the original MIST core has these timing issues.

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

PPL is sometime hacked to reach low arcade clocks, by lying clock input in sdc file, making coolest time equation. But it is infrequent (and is about 27/2 or 27/4 not 27/1.342...). Normally 27MHz shall be 27MHz in Generated clocks table, or else mist.sdc is not taken correctly into account.

I don't remember exactly, I think you have to add mist.sdc by using right click on "TimeQuest task">configure (just between Synthesis task and Assemble task ?)

from mist-board.

jotego avatar jotego commented on July 25, 2024

I have made some screen shots. The 27MHz clock frequency is observed in the clock table. The PLL is used to generate 32MHz and 128MHz and the RTL code is then using the 32MHz to generate 8MHz. There are many clocks unconstrained, though.

I am going to see if there are more complete mist.sdc files in prior versions. The current one is minimal. Time analysis shows issues in the video module and that is one of the problems I often found: corrupted video and MIST not powering up properly.

(Again, I am always refereing to the MIST core, i.e. the Atari ST core)

screenshot-1
screenshot-2
screenshot-3
screenshot-4

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

In CoreAmstrad I use :
--signal c0 : std_logic;--27MHz 20/135 =4MHz Z80/bootloader
--signal c1 : std_logic;--27MHz 125/135=25MHz VGA
--signal c2 : std_logic;--27MHz 572/135=114.4MHz SDRAM
--signal c3 : std_logic;--27MHz 1/2250 =12kHz keyboard
keyboard's12kHz is RTL made, but 4MHz is still pure PLL. Perhaps 40/135 is tolerated...
Can I have the equations (multiplier/diviser) used here (in MIST core) ?

*my sdram is adapted for 114.4MHz just have to change normaly the RASCAS_DELAY and CAS_LATENCY parameters in sdram.v by ones in https://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/zsdram.v

Normaly you can reach all MHz frequency using PLL (by solving equation : a common diviser, a maximum of common multiplier small first numbers (20 = 5 * 2 * 2; 125 = 5 * 5 * 5; 572 = 2 * 2 * 11 * 13))
For kHz do use RTL...
Adding a not(clock) does break the time constraints, the best is switching between rising_edge/falling_edge instead of adding not(clock). Personnaly I use a main "clock componant", wiring all "clock" and "not clock". It's easier to solve time constraint problems (not(not(clock)))

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

Another equation set (last CoreAmstrad), with common multipliers this time
c0 27MHz *89/600 = 4MHz (Z80) =>perhaps 89/300 is fine for our project
c1 27MHz *89/48 = 50MHz (VGA)
c2 27MHz *89/21 = 114,4MHz (SDRAM)
c3 27MHz *89/150 = 16MHz (PWM)

You can make an Excel table with 27 at top left, and multiplier at top (1 2 3 4...), and diviser at left (1 2 3 4...), and put a =$A$1*B$1/$A2 at B2. If you find a same column or a same row with all your clock, you win.

600 = 5 * 5 * 3 * 2 * 2 * 2
48 = 3 * 2 * 2 * 2 * 2
21 = 3 * 7
150 = 5 * 5 * 3 * 2

from mist-board.

jotego avatar jotego commented on July 25, 2024

(I didn't know you were the author of Amstrad core. Let me thank you for that contribution. I really enjoy it!)

These are the PLL settings in MIST:

c0 128MHz -> 27/27_128
c1 32MHz -> 27/27_32
c2 128MHz (used as SDRAM_CLK) --> 27/27*128 with -2500 phase shift

I am not sure what the phase shift is about. Then they have a lot of clock dividers using RTL in the logic, like this one:

//// 8MHz clock ////
reg [1:0] clk_cnt;

always @ (posedge clk_32, negedge pll_locked) begin
    if (!pll_locked) begin
        clk_cnt <= 2'd2;
    end else begin 
        clk_cnt <= clk_cnt + 2'd1;
    end
end

assign clk_8 = clk_cnt[1];

I think that the FPGA may be failing to recognize signals like clk_8 as clocks and definetely it is not deriving the right frequency constraint for them. If they are not recognized as clocks they are not routed as clocks either using the special clock tree routing inside the FPGA. Are you using any clock of this sort?

Your comment about negated clocks causing problems also worries me. But you say that using posedge and negedge in the always statement is fine, isn't it?

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

Respecting Time Constraint is a good practice. For complex component is nicer to respect them (like processor clock)

phase_shift seems here just a hack to start sdram before core (a reset not implemented somewhere ?), normaly the reset signal is delayed by ARM. I don't use phase_shift in CoreAmstrad, I don't use pll_locked also (it's about stabilisation of PLL start), the "negedge pll_locked" seems used here as another hack to start processor after sdram...

So if you try adding a c3 at 8Mhz 27 / 27 * 8 you will perhaps have to remove the pll_locked and phase_shift hack.... or else plug pll_locked as processor reset (not as vhdl process stimulus (not as clock but as simple value))

In my version of sdram.v (zsdram.v) I added a synchronization algorithm (commented "some synchro by here")... perhaps it shall help to stabilize sdram in case of problems...

Your comment about negated clocks causing problems also worries me. But you say that using posedge and negedge in the always statement is fine, isn't it?

Yes it's fine, but a lot of big components comes from opencores, and should not change... (patch has to be commented)

from mist-board.

robinsonb5 avatar robinsonb5 commented on July 25, 2024

The phase shift is on the external clock signal that goes to the SDRAM chip itself and it's there to help make sure the timing requirements of the chip are met, so that control and data signals are stable before the SDRAM sees the clock edge.

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

It's the inverse, sdram controler has a boot slow standard full re-init/reset sequence, sending several init commands to sdram, and then became ready to receive data... after a certain time.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I think you are confusing a few things.

First there's the hardware startup. When a fresh core has been loaded the FPGA and its PLLs need some time before the clocks are all stable at the right frequency. This is what the pll locked signal is about. A core should not do anything before the clocks are stable. Thus the CPU reset as well as e.g. the sdram controller usually waits for the locked signal to become true.

Then there's the sdram. Unlike dram the modern sdram needs to be initialized before it can be used. So sdram init happens before the cpu can be started but after the pll has locked. Then the cpu can finally be started by releasing its reset.

So the sequence should look like: Everything waits for the pll to lock. Then the sdram initializes. Once that is done the CPU can start. In most cores the CPUs reset doesn't wait for the sdram controller. Instead the reset is simply applied a few milliseconds while the SDRAM init only needs a few microseconds. This makes sure the sdram is fully operational before the cpu starts. The minimig core is one of the few where the sdram controller actually generates a reset signal for the cpu so the sdram is ready when the cpu starts.

This all isn't a hack. This is how it's supposed to be done.

The clock shift is something else. The signals driving the sdram need a few nansoconds to leave the FPGA and to reach the SDRAM chip. All signals are synchronous to the clock (that's what the S in SDRAM stands for). So the clock signal basically sais "dear sdram chip, when this clock rises please look at all the other signals". But that means that the FPGA must have setup all those other signals beforehand. So there are two clocks. One used by the FPGA internally to generate "all those signals" and the one going to the SDRAM telling it to have a look at "all those signals". And to make sure that "all those signals" have sufficient time to leave the FPGA and to reach the SDRAM the two clockes are slightly phase shifted. So the FPGA had enough time to setup "all those signals" and the signals had enough time to reach the SDRAM before the SDRAMs clock tells it to have a look at the signals.

This is also fine. At lower clocks this is more relaxed and the SDRAM may still have enough time to evaluate the signals before the FPGA changes them again. But if things get close to the SDRAMs 133MHz limit then every nanosecond counts and some fine tuning by shifting e.g. 2,5ns is needed.

This also isn't a hack but needed to fine tune the overall timing to a few nanoseconds.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

But i fully agree that some more complete timing constraints are really needed. And i'd be happy about any contribution here as my experiences with the timing constraints are actually quite limited.

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

reset from ARM is status[0] (status coming from user_io.v)
you can mix it with pll_locked in order to have a coolest processor reset signal.
reset from ARM does wait in order to fill RAM with ROM before realize, so at end of it you can end reset of processor...

This also isn't a hack but needed to fine tune the overall timing to a few nanoseconds.

*a patch

from mist-board.

sorgelig avatar sorgelig commented on July 25, 2024

I didn't look in ST core, but first thing you need to be sure: the whole project is synchronous. It means, everything synchronized by one global clock (well, except those external clocks like SPI). If you use always(posedge/negedge some_my_signal) then it may break the whole project functionality. Even single "always" may break a lot. You may have many different strange side behaviors if something is asynchronous in the project.
When i was working on my ZX core and it was asynchronous it was very hard to add any new module and i couldn't reach CPU speed more than 7MHz. When i converted the core to synchronous style, i could easily reach theoretical maximum of CPU speed - 56MHz. And i could easily add more as many new modules as i want. It looked like a magic.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

The TG68K CPU in the Atari St core runs pretty stable at 32MHz. This is actually not bad and i haven't seen it running faster anywhere else.

from mist-board.

sorgelig avatar sorgelig commented on July 25, 2024

May be in Apollo accelerator board?

from mist-board.

jotego avatar jotego commented on July 25, 2024

I take the term "synchronous" as Sorgelig uses it as having only one clock domain in the design. And I agree that all hell breaks loose on a design with multiple clock domains if the time constraints are not set correctly and the inter-domain transfers have not been dealt with in the RTL. But, when things are well done in the RTL and the SDC, then a design with multiple clock domains can actually be faster than a single clock counterpart.

As my only experience with digital design is in ASICs, I can assure that a design with time issues is not taped out (sent to the foundry). In fact, when I started dealing with FPGA's it was a shock to me to see that the tools produce an output file regardless of the STA (static time analysis) results. It seems to make sense to be able to make some quick and dirty tests but, upon release, STA must be met. If STA fails it means that some devices, upon some conditions will fail. And you have no control about when and where they will fail.

I spent some time adding more detailed constraints yesterday and finally got one core that worked well with my FPGA. It still had quite a long list of timing violations but it was a bit better than the older ones. Then this morning, I turned on MIST and found the attached screen. The timing violations around the video module were showing up again. I reset MIST and worked and it has been working during the rest of the day. Will it fail again? Definetely.

It is difficult to me to add constraints to a design that I do not understand. Till Harbaum has shared some architectural aspects that are critical to writing the SDC. On top of that, things like false paths or multicycle paths can only be written with an understanding of the design. And if worse comes to worst and RTL redesign is needed, then understanding of the architecture is critical.

FPGA vendors recommend design approaches like incremental compilation: have a set of the design done and implemented perfectly and then keep it fixed in the FPGA as new modules are added using the rest of the space. Ideally, the MIST core should be clean on its own so new functionality can be added without having to worry about issues in the previous system.

img_20161001_074105

from mist-board.

 avatar commented on July 25, 2024

FWIW i got the ao68000 core up to 80Mhz on my digilent Atlys. However the
performance isnt great because its a microcoded core with no pipelines.

S.

On Sat, Oct 1, 2016 at 10:02 PM, jotego [email protected] wrote:

I take the term "synchronous" as Sorgelig uses it as having only one clock
domain in the design. And I agree that all hell breaks loose on a design
with multiple clock domains if the time constraints are not set correctly
and the inter-domain transfers have not been dealt with in the RTL. But,
when things are well done in the RTL and the SDC, then a design with
multiple clock domains can actually be faster than a single clock
counterpart.

As my only experienced with digital design is in ASICs, I can assure that
a design with time issues is not taped out (sent to the foundry). In fact,
when I started dealing with FPGA's it was a shock to me to see that the
tools produce an output file regardless of the STA (static time analysis)
results. It seems to make sense to be able to make some quick and dirty
tests but, upon release, STA must be met. If STA fails it means that some
devices, upon some conditions will fail. And you have no control about when
and where they will fail.

I spent some time adding more detailed constraints yesterday and finally
got one core that worked well with my FPGA. It still had quite a long list
of timing violations but it was a bit better than the older ones. Then this
morning, I turned on MIST and found the attached screen. The timing
violations around the video module were showing up again. I reset MIST and
worked and it has been working during the rest of the day. Will it fail
again? Definetely.

It is difficult to me to add constraints to a design that I do not
understand. Till Harbaum has shared some architectural aspects that are
critical to writing the SDC. On top of that, things like false paths or
multicycle paths can only be written with an understanding of the design.
And if worse comes to worst and RTL redesign is needed, then understanding
of the architecture is critical.

FPGA vendors recommend design approaches like incremental compilation:
have a set of the design done and implemented perfectly and then keep it
fixed in the FPGA as new modules are added using the rest of the space.
Ideally, the MIST core should be clean on its own so new functionality can
be added without having to worry about issues in the previous system.

[image: img_20161001_074105]
https://cloud.githubusercontent.com/assets/1863036/19016946/4836bcf8-8829-11e6-86d5-9f9d970325d9.jpg


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#38 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABPNmTbx1fwaqBPEef32EsCRJWnAmJC9ks5qvspigaJpZM4KKfbp
.

Stephen Leary

from mist-board.

jotego avatar jotego commented on July 25, 2024

The timing problems are not in the microprocessor but in the peripherals.

By the way, the ao68000 uses half the logic gates than the TG68K core. It probably compiles faster too because of the microcode.

from mist-board.

jotego avatar jotego commented on July 25, 2024

By the way, what is the frequency of the input port SPI_SCK?

from mist-board.

jotego avatar jotego commented on July 25, 2024

Sorry for the third comment in a row. I found an example of an asynchronous module: mfp_srff16. These are latches with edge set and reset signals. TimeQuest considers the driving signals as clocks and implementation gets messy. I have checked these signals and they are actually generated using an 8MHz clock. I think mfp_srff16 should be synchronous: i.e. regular D-type flip flops using the same 8MHz clock.

If I do that change, the isr_latch signals will get delayed by one 8MHz clock cycle, in comparison with current design. Is that ok? (I am going to try...)

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

This part of the mfp was redesigned several times to match the behaviour of the real mfp to allow for e.g. flawless midi playback using cubase.

I'd strongly suggest to not do changes that will change the behaviour. And of course such changes will require extensive testing. The st will boot with a pretty broken mfp. But you'd see all kinds of strange and hard to debug issues in games and demos. These particular parts control the interrupt behaviour and the symptoms will be stack overflow and irq priority problems and the like. E.g. cubase may crash while you move the mouse. I spent plenty of time finding and fixing mfp problems. You can see that from the commits.

Have a look at early versions. The mfp once was synchronous but didn't work satisfyingly.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Spi clock is up to 24 mhz

from mist-board.

 avatar commented on July 25, 2024

I agree with this. The MFP was why I gave up on my ST core and let Till pick at the bones of what I'd got done

Sent from my iPhone

On 2 Oct 2016, at 08:04, Till Harbaum [email protected] wrote:

This part of the mfp was redesigned several times to match the behaviour of the real mfp to allow for e.g. flawless midi playback using cubase.

I'd strongly suggest to not do changes that will change the behaviour. And of course such changes will require extensive testing. The st will boot with a pretty broken mfp. But you'd see all kinds of strange and hard to debug issues in games and demos. These particular parts control the interrupt behaviour and the symptoms will be stack overflow and it's priority problems and the like. E.g. cubase may crash while you move the mouse.

Have a look at early versions. The mfp once was synchronous but didn't work satisfyingly.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

from mist-board.

jotego avatar jotego commented on July 25, 2024

Thanks! I will constraint each set/reset pin individually then to 8MHz without touching the RTL.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Just to make it clear: I have nothing against experiments with other CPU cores and synchronous designs etc. Actually whatever you work on is great! But i'd prefer such extensive changes to happen in separate branches as the chances are that you are not 100% satisfied with the results.

from mist-board.

 avatar commented on July 25, 2024

I would agree with this approach.

Sent from my iPhone

On 4 Oct. 2016, at 19:42, Till Harbaum [email protected] wrote:

Just to make it clear: I have nothing against experiments with other CPU cores and synchronous designs etc. Actually whatever you work on is great! But i'd prefer such extensive changes to happen in separate branches as the chances are that you are not 100% satisfied with the results.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

from mist-board.

jotego avatar jotego commented on July 25, 2024

Yes, that makes sense. If the RTL is modified it needs testing. Notice however that the current system is not stable either because of timing.

I spent more time on this on Sunday. There are too many signals and blocks that I don't know enough of to progress quickly. To be honest, this is quite a diversion from my objective which was to add FM sound to the ST (see a preview here). So I have decided to put this aside for a better time.

I found today that the version of Quartus we had at work has full license and I can make partitions with it, which would help to do a time closure. However, I do not expect to spend time on this at least in a few months.

If in the mean time someone does a proper time closure on this, I will be happy to add FM with MIDI to the ST, so we can have MIDI sound without external devices.

from mist-board.

jotego avatar jotego commented on July 25, 2024

The clock shift to the SDRAM is indeed a hack. You can imagine that 2.5ns of delay will only work with some FPGAs and with some SDRAMs under some voltage/temperature conditions. The correct way of doing this is using self-synchronization techniques. You shouldn't need to write a specific time delay in the RTL whatsoever.

Here is a good, if lengthy, application note from Altera that goes through this. The general idea is that the circuit measures the time delay to the external device and uses it to compensate the internal clock. If that document is too long, there is a nice diagram on this other one. Check out figure 40 (page 47). You can see there how the clock is fed back into the DCM (PLL) to compensate for the network delay.

from mist-board.

rkrajnc avatar rkrajnc commented on July 25, 2024

The clock shift is a pretty standard way of handling SDRAM memories. I guess you can call it a hack, but there's no other simple way of handling this (other than significantly lowering SDRAM clock). Taking into account relatively slow SDRAM clocks (compared to DDR), it should be OK, even taking into account FPGA and SDRAM tolerances in 'normal' temperature conditions. It is mostly affected by PCB layout than anything else really.

I only skimmed over the Altera doc you linked, but that seems to deal only with DDR memories, which pose a whole different set of problems from 'normal' SDRAM memories, with its relatively slow clock frequency and high I/O voltages. The DDR interfaces, on the other hand, require proper balancing of clock delays, and that is indeed usually handled in an automatic way.

The other document - the Xilinx one - only mentions DCM clock compensation, which I believe is internal logic delay only, it doesn't take into account external delays. Besides, the Xilinx devices have DCM (sort of a DLL, with added stuff), which is a different thing than Altera's PLLs, so I don't think that document is applicable here.

There is one document that describes how to handle SDRAM memories with Altera FPGAs: https://www.altera.com.cn/zh_CN/pdfs/literature/hb/nios2/n2cpu_nii51005.pdf

While I don't think it is necessary to worry about it (more problems would be solved by switching to a 4-layer PCB, than worrying about SDRAM clock delay), if you do have any more info, or any simple way to do auto-delay with SDRAM, I'd be interested to hear about it.

from mist-board.

jotego avatar jotego commented on July 25, 2024

Actually... Now that you mention it, it does seem to be pretty common. Up to the point of being in an appnote! It is shocking to me as an ASIC designer!

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I am fully aware that the MIST contains a lot of compromises, small hacks and imperfections. But without that it would have never seen the light of day. And i understand that it can be frustrating to face some of its shortcomings. The main design goal was "fun" which imho was achieved. Now it may be time to focus a little more on perfection, indeed. Hopefully that will not stop you from contributing ...

Did you take a look at other projects? E.g. the suska or the firebee may be what you are looking for.

from mist-board.

renaudhelias avatar renaudhelias commented on July 25, 2024

phase shift is in degree (modulo 360), here phase shift value is -2500, that's why I called it a hack : it does insert a boot delay. But it is a nice hack (no problem with Time Constraint here)
The problem with Time Constraint here is arround RTL building the 8MHz and perhaps also the "negedge pll_locked" (pll_locked is not a clock, it's a reset signal)

Found a lot of talks around that (clock/reset) : http://electronics.stackexchange.com/questions/163018/asynchronous-reset-in-verilog ...

In vhdl I do use reset like this (asynchronous reset (it is not a clock)) :

ctrcConfig_process:process(reset,nCLK4_1) is
if reset='1' then
    Dout<=(others=>'1');
elsif rising_edge(nCLK4_1) then

But it can be also (synchronous reset) :

ctrcConfig_process:process(nCLK4_1) is
if rising_edge(nCLK4_1) then
if reset='1' then
else

http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/simple_GateArrayInterrupt.vhd

In http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/zsdram.v I play with a captured clock (clkref_i) : a clock not being a clock. In zsdram.v I did a lot of effort, perhaps you can merge it if you want.

from mist-board.

jotego avatar jotego commented on July 25, 2024

I am fully aware that the MIST contains a lot of compromises, small hacks and imperfections. But without that it would have never seen the light of day. And i understand that it can be frustrating to face some of its shortcomings. The main design goal was "fun" which imho was achieved. Now it may be time to focus a little more on perfection, indeed. Hopefully that will not stop you from contributing ...

I like the word perfection. I will continue working on sound chip cores for a while before I return to trying to add a full set of functions to an existing core.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Hi guys,
I have a side-project for some time, where I'm trying to make this core synchronous.
I've finished with DMA (was a though one), shifter, and improved FMax with some tweaks in some not timing sensitive code (like OSD, etc...)
I didn't even dare to touch MFP yet.
There's a problem sometimes right after booting that the low and mid-resolution is garbled, but I don't think it was because of code changes, since it happened right after I've enabled time-based synthesis.
Here's the current state:
https://github.com/gyurco/mist-board/commits/mist-experiments
My final goal is to replace the CPU to FX68K (or make it optional at least).

from mist-board.

jotego avatar jotego commented on July 25, 2024

Hi Gyurco,

I like it so much that you are doing this! Let me share a couple of things I've learnt:

  • You can simulate your design using the memory model from the manufacturer. There is a copy of that file in my jt_gng repository. When you do that you can see how phase shifting affects it and the right value for CL in the SDRAM. Notice that I have a positive value for the SDRAM CLK time shift in my cores. I have no timing errors.

  • The 1943 core in MiSTer has severe timing errors in all corners. It should not work at all. If it does, is because there is a lucky combination of clock delay inside the FPGA, a negative time shift for the SDRAM clock and a CL value that happens to work. I want to fix it at some stage but I do not have so much control over the MiSTer framework.

  • A few weeks ago I made a number of changes to my 1942 core very quickly without testing them on the FPGA. When I finally tested them I found that it didn't work. Trying to fix the problem drove me to making other changes but it still didn't work. I fell into a loop and spent 6 days trying to fix it. Eventually, I had to go back to a secure place in the repo and apply changes one step at a time while trying it. Since then, I commit small steps when I am about to make a big change. Maybe that advice can be helpful to you too.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Hi Jotego, thansk for sharing your thoughts.

  • For the SDRAM, I guess I'm using the TimeQuest report for the correct shift. Interesting if you're using the dedicated PLL output (c0 or c3 of PLL1), the clock is delayed much less. I had a problem with the mt48lc16m16a2.v module for simulations, it uses the # notation for delays, which Verilator doesn't support. Maybe that was the cause why the Archie had bad timings in its SDRAM controller, the simulation returned the good data with the bad CL setting.
  • For the MiSTer SDRAM, I think you cannot know the real delay, because the usage of the GPIO pins are adding an unknown factor, so the SDRAM datasheet delay values are probably not usable in the SDC.
  • Yepp, I also like small atomic commits :) git bisect is a great tool to find regressions.

I've progressed a bit more, almost all generated clocks are eliminated, except in MFP. Now I'll try to change the CPU to the cycle-exact one.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Connected the FX68k CPU, it starts, but hangs after a while at 41fffe. In the code, it's some kind of IO for hbi, would be good to have some info what's done there.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

41fffe sounds like the first time the ramtest hits a bus error on a 4mb machine.

My guess: bus error does not work and the cpu waits forever fir dtack.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Yes, examined that bus error is generated, but not cleared. The tg68 has a bus error clear output, but the original CPU hasn't. How the GLUE supposed to clear BERR?
Or it supposed to jump to the bus error handler first? Seems Genesis knowledge is not enough here :)

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Seems clearing BERR after some cycles allows to continue. Now I guess I have memory write problems. Probably because of the original CPU signals are timed a differently than expected (e.g. during write, UDS and LDS are delayed by one cycle to AS and RW).

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I think you would apply BERR in parallel with DTACK just like any other CPU input signal. For the tg68k i had to generate a latched signal and this there's a seperate clear signal. With the fx68 this should imho all not be needed.

The tg68k was designed for the Amiga which does not use BERR at all.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

BTW: If the BERR is not deasserted fast enough the CPU would detect a double BERR and would go into HALT state.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

As I read in the 68000 manual, it's not critical how much time the BERR remains asserted (Figure 5-25).
"As long as BERR remains asserted, the data and address buses are in the high-impedance state."

Upd: now it's in an endless loop in TOS, around E013f6

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

But you need to do something to get out of the berr state. Isn't the CPU still waiting for the dtack?

I can have a look what that loop waits for. I remember such issues when writing the core in the beginning.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

No, BERR seems to be OK, it's implemented as the 68000 manual says now (clear after some cycles).
I strongly believe writes to the SDRAM are the problem now.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

But writes are generally working as e.g. the ram test passes.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Forgot that TG68k is clocked in 2MHz to have an effect of a 8MHy 68000. Clocking FX68 at 8MHz and make sure that the established bus cycles are OK is a challenge now.
Maybe if dtack_n generation remains the same, then the CPU will align itself to the existing cycles.
Upd: seems now it's going further, the clock is corrected. Now loops at e02438. Writing constantly to fffa22 (MFP?). I wonder if the existing MFP is compatible.
So is there a quick memtest before the Atari logo?

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

There is quick test that checks the ram layout and configures the MMU appropriately. It's not a real ram test but it tests a few cells here and there to figure the type/size of ram chips out. The real ram test comes after the logo. But when you see the logo then the CPU was able to paint the logo into video ram.

Really fffa22? Likely with only LDS asserted? That would be a byte access to fffa23. That loop at e02438 seems to "poll" timer c. It writes a value to that registers and as long as it reads a different value back it tries again. This seems to test if writing that register actually works. The counter may actually be increased in that very moment so it makes sense to try this a few times. But if it constantly fails then either reading or writing MFP registers does not work.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Yes, it writes then re-reads the same value again and again. I wonder why it doesn't work, the data/address bus seems to be stable for the whole 8 MHz cycle during the MFP access. Or it fails previously somewhere (don't know if the timer should be started before).
Seems it writes to timer C control, but only a 0.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

FFFA23 is "MFP Timer C data" what makes you think it's the control?

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

I've checked if it tries to start it, it writes 0 to the control before the write-read endless cycle of FFFA23. Should check with the TG68k what should happen.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Timer C is the 200 Hz system timer. You should be able to see the write and read data. And they should be the same but aren't.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Seems with TG68, the control register is set to 5h, with FX68k, it's set to 0. Something is very broken.
Meanwhile I'm uploaded the FX68K patch to my experimental branch (FX68K can be enabled via a `define at the beginning of mist_top.v), if you're interested (secretly hope you have some idea how to fix before I start a long and tedious debugging session).

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I'll have a look ....

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Thanks! Hope you'll find interesting to have a cycle-perfect CPU in the core. No need to hurry btw :)

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Some progress: the timer is initialized correctly, now it draws something to the display: 2 bombs. But better than nothing. I think the interrupts are not working now, maybe the existing auto-vector emulation? If I understand correctly: HBI and VBI are auto-vectored, the MFP interrupt is placing the vector to the bus.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Yes, exactly. And since the tg68 had no working auto vectors i made them kind of fake auto vectors.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Checked this, but seems until the point it reach, there are only MFP interrupts. Two bombs = BUS ERROR again, but this time it's not expected. It tries to write to 000000h.
I see many other bus errors, like checking the STE dma, FPU, etc., but those are probably expected and handled.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I'll have a look

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Thanks! All uploaded to my mist-experiments branch. I've tried TOS 1.06, it's 33 bombs, I think it's TRAP #1.
Upd.: changed hbi and vbi to native auto-vectoring, and GEM starts with TOS 1.06! Also Operation Wolf loads. Hallelujah!

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

A question about memory access: I see there are 4 bus cycles, one cycle is 1 8 MHz cycle.
Then here's a controversy in the source:
https://github.com/mist-devel/mist-board/blob/master/cores/mist/mist_top.v#L1005

It says one cycle for CPU, one for shifter, others are not used (which is about the same as the original hardware). But the code assigns two video cycles. Checked with SignalTapII, it's not really used in normal modes. Maybe is it required for Viking? And I wonder about the 16 MHz Mega STE, is its memory bandwidth really doubled?

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Two cycles are used if the "viking" hirez video card is emulated. A normal Atari ST does not have that and it would only need one slot for CPU and one for video leaving 50% of the bandwidth unused.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

And a real Mega STE basically is a regular 8Mhz ST with a cache equipped speeder card built-in. A Mega STE does not have a different memory bandwidth. But without caches you'd have to double the memory slots to approximate the Mega STE speed.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

I see, starting to understand. A bit shocking that the ST memory's bandwidth only twice (ok 4th if 16bit vs 8bit has taken in account) of the old Apple II, which works in a similar setup.
I'm still struggling with peripherals, as I identified most problems come from the fact that usually the level-triggered response to CS and RWn signals causes writes in more cycles with FX68K until they de-activated than with TG68K.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

In TOS 2.06, after the Memory Test Complete message, there's a black bar, which should decrease. But it won't, just wait for something (maybe an interrupt?). Do you know what should happen there?
TOS 1.02 is mostly OK.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I just ran tos 2.06 and it booted into the desktop and i could even load a benchmark from floppy disk und run it.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

With my branch? Excellent. I've found an issue with IACK of the MFP since I've asked what happens at the black bar (and nothing special, just timer C should fire periodically). Now struggling with DMA again. But I expect that pure ST mode will be usable soon.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

I wonder if are there any good tests for the blitter?

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

If you enable the blitter in the desktop then the desktop itself is painted using the blitter. If that works then the blitter is mostly fine.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

@gyurco This means that spectrum512 images work now. That's impressive. They use synchronous code execution to change pallette registers on the fly. Another thing that might now be possible is to implement the cubase 2 dongle. That also relies on a cycle exact cpu.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Well, not in all games, but in a lot, e.g. Arkanoids, Highway Encounter, etc.. Still wrong in ATF2 for example. This dongle is an USB gadget? Well, I'm not sure even MIDI still works :) There were changes in ACIA, but I don't have MIDI ports on my MiST, and also not familiar how to try it. Same for the external parallel and serial ports. Didn't touch ethernec, but would be good to change it to clk32/clk_8_en structure.
And Defender of the Crown still broken, it looks frozen after one turn (same on the old core).
But I want to fix STE DMA audio first (Stardust with intro is strangely broken - btw seems it has wrong colors mid game with the original version).
And I found one thing worth (and not hard) to implement: two memory slots for ROM reads, as I see on the schematics, it's not on the shared bus between CPU/Shifter.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

That dongle is a "cartridge". It basically has some complex state machine which is triggered by reads to certain addresses. It then changes states depending on the CPUs bus activity and a few moments later the CPU verifies the dongle cartridges state. This state only matches when all bus activity during that is exactly what it's supposed to be. If it's not then there either is not dongle or the CPU does not work 100% the way it's supposed to. That's why the cubase2 dongle does not work with speeder cards or the mega ste. The caches of these cause the bus traffic to differ and this the program assumes there is no dongle.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

And yes, adding some rom read slots may be good. Currently the core runs about 10% slow according to benchmarks. This may be related to this.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Added second slot for ROM in normal modes, the second slot for every memory access to MegaSTE 16MHz mode, and the STeroid (which just a MegaSTE with default 16MHz clock). I think only Viking is missing now, and then a bug hunting must be started.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Ahh! Switching PAL mode to 50Hz from 56Hz shows the remaining Spectrum512 title screens correctly!

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

There are a couple of games, which doesn't respond well to controls (HD install versions). E.g. R-Type I, no control at all. It's the same with the original unmodified core. What do you think, is it MFP, ACIA, maybe firmware IKBD issue?

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Also I wonder if there are any test suites for Emulators, like the VDP tests for Genesis, VICE tests for C64 and so on. Would be good to check the MFP for accuracy, which I guess is the biggest factor of problems.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Hmm, in R-Type I see in the ARM serial output:
IKBD: Disable mouse
IKBD: Set relative mouse positioning

I wonder if the mouse should be re-enabled after seeing a new mouse command. Hmm, it seems re-enabled in the code, but no "Sent bytes..." after that point.

Upd.: seems a PAUSE is also issued (just no debug output) and "set relative mouse positioning" should unpause. R-Type is controllable!

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Wow, cool. Thanks a lot.

from mist-board.

squidrpi avatar squidrpi commented on July 25, 2024

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Anyone can/willing to test if MIDI/serial/parallel ports are still working?

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I'll test the coming weekend if the midi ports are still working. The parallel and serial redirection may need some more time to re-assemble the old test setup. I'll also test the four player/joysticks gauntlet which is related to the printer port.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Great! I guess the gauntlet is only depends on the PSG (which was replaced with a recent one, so would be good to test also).
Meanwhile I still doesn't found why some games/demos run only in STEroid mode (16 MHz).
Always reproducible: Startdust with Hardcore intro (with trainer).
I assumed it's because TG68K has a much shorter INTACK cycle, so tried to:

  • reduce delay of st_de (the intro uses timer B)
  • reduce delay of hbi and vbi
    but didn't work.

Defender of the Crown is very sensitive to slowdowns, and it started to work after the memory wait states were corrected.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I recently fixed some issue in the old PSG implementation related to the fact that the PSG outputs are open drain and you can configure them as outputs and still read them like inputs. This was rather tricky to figure out .... will be interesting to see if the new PSG copes with that.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

I wonder about the DE delaying to the MFP is needed: DE from GLUE to MFP is just a direct connection. Any delay which occur from DE to the interrupt is happening in the MFP and CPU.
Also I wonder why custom overscan detection is needed? As it should happen "naturally" if the GLUE parameters are changed at exact places. Maybe is it relevant only if the CPU is not accurate?

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

I just spent an hour listing to MIDI songs played back on a KORG AG10 synth. MIDI works :-)

For some reason color video is totally broken for me. But B/W and Viking works and gives a nice Cubase experience.

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

The overscan detection basically enables a "fake" overscan in the video circuitry. The real overscan happens because people are messing with 50/60/70 hz video modes half line/image through. The video circuitry then gets its counters messed up and the you see more pixels than you should. I never implemented the video stuff in the way that it would naturally expose this same counter behaviour. Actually at that time it wasn't even fully known/documented.

So instead i had some triggers that fired when the mode changes happen at certain times during screen display and they would trigger some behaviour that somehow resembles what i real atari st would do. It's just a big hack ....

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Wow. That's great!

from mist-board.

harbaum avatar harbaum commented on July 25, 2024

Hey, that looks super cool. You note that refresh and cas generation is not useful. But the sdram still has these. So it might be possible to use these directly. But unfortunately the sdram in the MIST is 16 bit and the RAM in the ST was 32 bit ... Maybe it's still feasible with a two word burst ...

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

I think RAS_N can start an SDRAM r/w cycle. And there's the autorefresh command for it, so it seems to me that CAS_N can be skipped. The SDRAM has its own RAS-CAS latency, maybe the one generated by the GSTMCU wouldn't even fit that (and that's depend on the clock of the SDRAM controller, too).
32 bit? It wasn't the TT? STf/STe has 16 bit wide bus (both CPU and memory side - with some TTL circuits controlled by WDAT_N, RDAT_N and LATCH signals acting as the gateway between them, on STe, this is part of the shifter).
A slightly bigger issue that ROM access is 0 wait state, since it's directly on the CPU bus, but here we have to store it in RAM, and probably access it only at "CPU" slots. Maybe that's where refresh can be useful, since it occupies the unused "Video" slots during blank, and we can use it for ROM access.

from mist-board.

gyurco avatar gyurco commented on July 25, 2024

Thinking further about the ROM issue, then a faster SDRAM controller than required (4MHz cycles - not very fast) - e.g. your super-speed controller currently in the core -, and a separate port for ROM reads could solve it correctly with 0 WS. Even an SDRAM controller at 32MHz could cope with that.

from mist-board.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.