Giter Site home page Giter Site logo

t-crest / patmos Goto Github PK

View Code? Open in Web Editor NEW
131.0 131.0 74.0 52.9 MB

Patmos is a time-predictable VLIW processor, and the processor for the T-CREST project

Home Page: http://patmos.compute.dtu.dk

License: BSD 2-Clause "Simplified" License

Assembly 0.17% Makefile 0.20% C 15.24% C++ 0.19% Scala 2.54% VHDL 65.60% Verilog 9.76% Stata 0.06% Tcl 1.34% Shell 0.06% CMake 0.01% Java 0.18% Python 0.04% Mathematica 0.01% Raku 0.01% SystemVerilog 2.35% V 1.88% Promela 0.38%

patmos's People

Contributors

cgkiokas avatar davidchong99 avatar dlp avatar dsanz006 avatar edga avatar egk696 avatar elthra avatar emoun avatar epsilon-0311 avatar evaka avatar henrikh avatar jeunes2 avatar jgyork avatar lucapezza avatar majalund avatar michael-platzer avatar mortbopet avatar mziccard avatar nothinn avatar ogedai10 avatar phell avatar philipp-birkl avatar rbscloud avatar sabb avatar schoeberl avatar stefanhepp avatar thonner avatar tjarker avatar torurstrom avatar visq avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

patmos's Issues

Build Errors ptplib_demo

Hi all,

I wanted to test the ptplib_demo and I got a set of compilation errors. I will work on the fix

ptplib_demo.c:93:117: error: too many arguments to function call, expected 7, have 8
                        ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, PTP_MULTICAST_IP, seqId, PTP_SYNC_MSGTYPE, syncInterval);
                        ~~~~~~~~~~~~~~~                                                                                                  ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:95:119: error: too many arguments to function call, expected 7, have 8
                        ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, PTP_MULTICAST_IP, seqId, PTP_FOLLOW_MSGTYPE, syncInterval);
                        ~~~~~~~~~~~~~~~                                                                                                    ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:101:141: error: too many arguments to function call, expected 7, have 8
                                        ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastSlaveInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYRPLY_MSGTYPE, syncInterval);
                                        ~~~~~~~~~~~~~~~                                                                                                                        ^~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:127:141: error: too many arguments to function call, expected 7, have 8
                                        ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastMasterInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYREQ_MSGTYPE, ptpTimeRecord.syncInterval);
                                        ~~~~~~~~~~~~~~~                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:133:140: error: too many arguments to function call, expected 7, have 8
                                ptpv2_issue_msg(thisPtpPortInfo, tx_addr, rx_addr, PTP_BROADCAST_MAC, lastMasterInfo.ip, rxPTPMsg.head.sequenceId, PTP_DLYREQ_MSGTYPE, ptpTimeRecord.syncInterval);
                                ~~~~~~~~~~~~~~~                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/t-crest/patmos/c/ethlib/ptp1588.h:187:1: note: 'ptpv2_issue_msg' declared here
int ptpv2_issue_msg(PTPPortInfo ptpPortInfo, unsigned tx_addr, unsigned rx_addr, unsigned char destination_mac[6], unsigned char destination_ip[4], unsigned seqId, unsigned msgType);
^
ptplib_demo.c:181:109: error: too few arguments to function call, expected 6, have 4
                thisPtpPortInfo = ptpv2_intialize_local_port(PATMOS_IO_ETH, my_mac, (unsigned char[4]){192, 168, 2, 50}, 1);
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~                                                              ^
/opt/t-crest/patmos/c/ethlib/ptp1588.h:181:1: note: 'ptpv2_intialize_local_port' declared here
PTPPortInfo ptpv2_intialize_local_port(unsigned int eth_base, int portRole, unsigned char mac[6], unsigned char ip[4], unsigned short portId, int syncPeriod);
^
ptplib_demo.c:198:124: error: too few arguments to function call, expected 6, have 4
                thisPtpPortInfo = ptpv2_intialize_local_port(PATMOS_IO_ETH, my_mac, (unsigned char[4]){192, 168, 2, rand_addr}, rand_addr);
                                  ~~~~~~~~~~~~~~~~~~~~~~~~~~                                                                             ^

Split off simulator into its own repo

I think we should extract the simulator (everything under patmos/simulator) into its own repository under t-crest.

I need this to make a good automatic testing and deployment setup using github and travis-ci. This is because we currently have a circular dependency in our repositories:
To build and test LLVM, we need pasim, however, to build and test Patmos (which contains pasim), we need LLVM. However, the simulator on its own is not dependent on LLVM, which means if we extract it, we can build it independently and then use it for the LLVM build.

I have already tested that the simulator folder is completely self-contained and can simply be extracted from the patmos repository. However, I do not have permission to make new t-crest repositories, so I need someone who can do this for me.

Mismatch between stack cache and other memory

The test cases inst_tests/stackcache_load_store.s and inst_tests/datacache_load_store.s are almost identical. However, the first load to r3 sets the register to 0xffffffff when loading from the stack cache, while the result of the corresponding load from the data cache (or local memory or global memory) is 0. As the same data is written to the respective address before the load, the results should be identical.

Patmos emulator fails to build with Verilator >= 4.212

Patmos emulator fails to build with recent versions of Verilator (tested with 4.212 and 4.214). misc/build.sh patmos produces following error:

g++  -I.  -MMD -I/usr/share/verilator/include -I/usr/share/verilator/include/vltstd -DVM_COVERAGE=0 -DVM_SC=0 -DVM_TRACE=1 -DVM_TRACE_FST=0 -faligned-new -fcf-protection=none -Wno-bool-operation -Wno-sign-compare -Wno-uninitialized -Wno-unused-but-set-variable -Wno-unused-parameter -Wno-unused-variable -Wno-shadow     -Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VPatmos -DVL_USER_FINISH -include VPatmos.h  -DVL_THREADED -std=c++17  -c -o VPatmos__Trace__4__Slow.o VPatmos__Trace__4__Slow.cpp
echo "" > VPatmos__ALL.verilator_deplist.tmp
../Patmos-harness.cpp: In member function ‘void Emulator::emu_uart(int, int)’:
../Patmos-harness.cpp:191:12: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Cmd’
  191 |     if (c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Cmd == 0x1
      |            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:192:16: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Addr’
  192 |         && (c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Addr & 0xff) == 0x04) {
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:193:28: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Data’
  193 |       unsigned char d = c->Patmos__DOT__UartCmp__DOT__uart__DOT__uartOcpEmu_Data;
      |                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:201:25: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__tx_baud_tick’
  201 |     bool baud_tick = c->Patmos__DOT__UartCmp__DOT__uart__DOT__tx_baud_tick;
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:216:16: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__rx_state’
  216 |             c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_state = 0x3; // rx_stop_bit
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:217:16: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__rx_baud_tick’
  217 |             c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_baud_tick = 1;
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:218:16: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__rxd_reg2’
  218 |             c->Patmos__DOT__UartCmp__DOT__uart__DOT__rxd_reg2 = 1;
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../Patmos-harness.cpp:219:16: error: ‘class VPatmos’ has no member named ‘Patmos__DOT__UartCmp__DOT__uart__DOT__rx_buff’
  219 |             c->Patmos__DOT__UartCmp__DOT__uart__DOT__rx_buff = d;
      |                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
At global scope:
cc1plus: note: unrecognized command-line option ‘-Wno-undefined-bool-conversion’ may have been intended to silence earlier diagnostics

Downgrading to Verilator 4.200 fixes the problem.

ISA change proposal: split/deferred instructions

This issue will track the discussion into changing the Patmos ISA to make use of either deferred or split instructions.

Motivation

Some types of instructions cannot be executed without incurring some kind of delay or latency in the pipeline. One example is load instructions, which currently have a 1 cycles delay slot before the loaded value can be used. Another example could be a multiply or division instructions, which requires multiple cycles to execute.
Deferred/split instructions try to address the inefficiency in instructions with latency, by allowing the compiler decide how to manage this latency.

Split instructions

Split instructions "split" a given instructions into two parts: (1) issue the instruction and (2) get the result.
E.g., loads could be split into issuing the load (lwc, load word from data-cache) and then putting the loaded value into a register (glw, get loaded word).
The two parts of the load can then be scheduled independently by the compiler, to try and avoid any latency by issue other instructions between them.

Example:

lwc t1 = [r1]    ; issue load of address in r1 to load-register t1
add r2 = r3, r4  ; do something else
add r2 = r2, r5  ; do something else
glw r1 = t1      ; get loaded value from load-register t1 into register r1
add r2 = r2, r1  ; use loaded value

Deferred instructions

Deferred instructions try to address the same problem with a different approach. In addition to providing an instructions with the usual operands, it is also provided with an immediate value operand that specifies when the result is expected.
The immediate value defines after how many instruction words the result should be available in the target register. The compiler can then use this immediate value to issue the instruction early and set the value to match when in the instruction stream it needs the result.

Example:

lwc r1 = [r1], 3 ; Issue a deferred load, with the value available to the third following instruction
add r2 = r3, r4  ; do something else
add r2 = r2, r5  ; do something else
add r2 = r2, r1  ; use loaded value

The deferral range is not specified yet, but suitable ranges could be between 32 and 256.

Merge and cleanup TTEthernet code from Maja

There are two merge conflicts that need to be resolved:

both modified:   hardware/config/altde2-all.xml
both modified:   hardware/src/main/scala/io/EthMac.scala

The whole merge adds quite some files:

modified:   c/ethlib/eth_mac_driver.c
modified:   c/ethlib/eth_patmos_io.c
modified:   c/ethlib/eth_patmos_io.h
new file:   c/ethlib/tte.c
new file:   c/ethlib/tte.h
new file:   c/ethlib_tte_demo.c
new file:   c/ethlib_tte_demo_interrupts.c
new file:   c/ethlib_tte_demo_latency.c
new file:   c/ethlib_tte_wcet.c
new file:   hardware/config/altde2-interrupt.xml
new file:   hardware/config/altde2-latency.xml
new file:   hardware/config/default-no-timer.xml
modified:   hardware/ethmac/eth_controller_top.vhdl
new file:   hardware/ethmac/eth_controller_top2.vhdl
new file:   hardware/quartus/altde2-interrupt/patmos.qpf
new file:   hardware/quartus/altde2-interrupt/patmos.qsf
new file:   hardware/quartus/altde2-interrupt/patmos.sdc
new file:   hardware/quartus/altde2-latency/patmos.qpf
new file:   hardware/quartus/altde2-latency/patmos.qsf
new file:   hardware/quartus/altde2-latency/patmos.sdc
new file:   hardware/src/main/scala/io/EthMac2.scala
new file:   hardware/vhdl/patmos_de2-interrupt.vhdl
new file:   hardware/vhdl/patmos_de2-latency.vhdl
new file:   wcet/analyse_wcet.sh
new file:   wcet/config_de2_115.pml
  • cleanup should remove unneeded top-level VHDL and configuration files.
  • the example application shall be a folder in app and not part of the Ethernet library

Argo integration

Most VHDL top-level files still contain the port to Argo, which shall be removed.

However, before changing all, we should decide which one we will drop.

Generated multi-core Patmos.v has trailling commas in unused IO device instantiation

When generating a Patmos multi-core device (i.e. 2x2) the instantiated UART, LED and KEYS IO devices for cores that do not have access to them (i.e. core# 1/2/3) have an error trailing comma as shown in the figure and Vivado 2018.2 gives an error during Simulation.

patmosgenverilogomulticoreio_bug

Perhaps devices that are not used by the cores should not be instantiated at all ?

Quartus 20.1 failing to flash Patmos onto DE2-115

Discussed in #94

Originally posted by LehrChristoph September 14, 2021
Hi all,

I started to develop a container image which provides all dependencies and has the compiler etc set up. With the setup script more or less everything worked fine, but when I try synthesizing patmos the Quartus fitter takes around 17 minutes on my AMD Ryzen 7 5800X, in comparison to Quartus 19.1 it only takes 7-8 seconds.

Additionally when flashing the hello puts example onto my DE2-115 Board the process exits with an error, using 19.1 the program exits normal.

That behaviour is quite strange for me, and I'm a little lost looking for the error although Quartus 20.1 should still support the Cyclone IV FPGA line.

pasim: Bundled pointer dereference followed by multiply doesn't correctly dereference

If we load a label into a register as part of a bundle, then dereference the register, followed by using a multiply on the resulting value, it does not execute correctly on pasim.

Take this program (main.c):

#include <stdio.h>
volatile int _1 = 1;
int init_func(){
  int x;
  asm volatile(
    "{add $r3 = $r0, _1\n"
    "nop}\n"
    "lwc $r5 = [$r3]\n"
    "li $r4 = 2\n"
    "mul $r4, $r5\n"
    "nop\n"
    "mfs %[x] = $s2\n"
    :[x] "=r" (x)
    :
    :"$r3", "$r4", "$r5", "$r4", "$s2", "$s1"
  );
  return x;
}
// Should print "2" for correct execution
int main(){
  printf("%d\n", init_func());
}

Looking in the inline assembly, we see that we start by loading the label _1 into r3. We then dereference r3 into r5, which means r5 = 1. We then load r4 = 2 and multiply r4 and r5, which should result in the value 2 in the special register s2 (the lower half of the mul result).
The code successfully compiles using patmos-clang main.c, but running it in pasim (using pasim a.out) result in the program printing 0, where we would expect 2.

This error is very specific. If we do any one of the following, the code will execute correctly in pasim:

  • Have the add not be part of a bundle. Though, no matter which other instruction is part of the bundle, the add will not success.
  • Use any other operation instead of the mul. E.g. if we exchange the mul for an add the correct result will be printed.
  • Use a specific immediate value in the add, or use a register, instead of a label.

Looking at pasim debug prints, I suspect the problem lies in the stalling of the regular pipeline by the multiply pipeline, but I am not sure. I will investigate.

Generating Aegean/t-crest makes the emulator build fail

Running make platform in Aegean generates the multi-core system. A side effect is that make emulator fails in Patmos.

The file patmos/hardware/emulator_config.h is generated by the Aegean build process and makes the build fail by taking precedence over the correct configuration file (in patmos/hardware/build/).

The configuration file is generated by patmos/hardware/src/patmos/Config.scala at line 254. The patmos generation is for some reason run with the cpp backend, but without a configured memory. This messes up the generated configuration header.

Work around: Delete patmos/hardware/emulator_config.h and run make emulator

Python 2

We used Python 2 in Aegean, Aegean is gone now. I guess we do not use python anymore anywhere else. So we should remove the dependency on the README and in the handbook.

Handbook: Calling convention regarding SRB,SRO,SXB,SXO

The handbook says:

  • The return information registers s7-s10 (srb, sro, sxb, sxo) are callee-saved saved registers.

This doesn't make sense to me, as a call would overwrite the registers with the callee's return information, making the callee unable to save the caller's return information.
After talking with @schoeberl, this seems to be a mistake, so I propose we reword it to caller-saved instead.

`brcf`, `brcfnd` documentation doesn't match LLVM implementation

The handbook documentation around the brcf and brcfnd says that these instructions use the formats CFLi (immediate operand) and CFLrt (two register operands). However, for inline assembly giving them two registers will throw an error:

<inline asm>:35:16: error: invalid operand for instruction or syntax mismatch
        brcfnd  $r1,  $r10
<inline asm>:36:16: error: invalid operand for instruction or syntax mismatch
        brcf    $r12, $r12

while giving them only 1 register operand (matching CFLrs format) will compile with no issue and even be accepted by patmos-llvm-objdump as a valid instruction.

So, is the documentation wrong, or is it the implementation?
If the documentation is wrong, is the CFLrt format used by any instructions (no other instructions are in the handbook.)
If the implementation is wrong, what would the semantics of the CFLrt formats be?

Strange timing behavior in `patemu` when executing same code on multiple cores

I am observing a behavior that I cannot explain when executing the same code on multiple cores in patemu. I was expecting that more cores executing the same code in parallel would slow down the execution on each core due to the increased number of memory accesses. However, the opposite seems to happen with code executed on just one core taking longer to execute than if the same code is run in parallel on multiple cores.

Steps to reproduce (starting from a clean build of the most recent version, i.e., commit 4e8f8d9):

  1. Edit patmos/hardware/config/altde2-115.xml and uncomment lines 7 and 8 as follows to build a multicore system (note that Argo is not used since it produces an error, see #104 ):
<!-- Default is single core -->
<pipeline dual="false" />
<cores count="8"/>
<!--<CmpDevs>
<CmpDev name="Argo" />
</CmpDevs> -->
  1. After rebuilding with misc/build.sh, save the following code to a file called test.c (sorry for the huge file, I tried to keep it as short as possible while still being able to reproduce the behavior):
#include <stdio.h>
#include <stdint.h>
#include <machine/patmos.h>
#include <machine/rtc.h>

#include "libcorethread/corethread.h"


void test(uint8_t *data) {
    int round;
    for (round = 1; 1; round++) {
        {
            uint8_t tmp = data[0];
            int i;
            for (i = 0; i < 15; i++) {
                data[i] = data[i+1];
            }
            data[15] = tmp;
        }

        if (round == 10)
            break;

        {
            int i, j;
            for (i = 0; i < 4; i++) {
                uint8_t *col = data + (i * 4);
                uint8_t copy[4];
                for (j = 0; j < 4; j++) {
                    copy[j] = col[j];
                }
                col[0] = copy[3] ^ copy[2] ^ copy[1];
                col[1] = copy[0] ^ copy[3] ^ copy[2];
                col[2] = copy[1] ^ copy[0] ^ copy[3];
                col[3] = copy[2] ^ copy[1] ^ copy[0];
            }
        }
    }
}


static uint8_t test_data[MAX_CPUS * 16];

volatile _UNCACHED static unsigned long long t_start[MAX_CPUS];
volatile _UNCACHED static unsigned long long t_end  [MAX_CPUS];

void work(void* arg) {
    int cpuid = get_cpuid();

    unsigned long long start, end;

    // have all CPUs start roughly at the same time
    while ((start = get_cpu_cycles()) < 1000000)
        ;

    test(((uint8_t *)test_data) + (cpuid * 16));

    end = get_cpu_cycles();

    // wait for other CPUs to finish
    while (get_cpu_cycles() < end + 1000000)
        ;

    t_start[cpuid] = start;
    t_end  [cpuid] = end;
}

int main() {
    int i, core_cnt = get_cpucnt();
    if (core_cnt > MAX_CPUS) {
        core_cnt = MAX_CPUS;
    }

    printf("Starting threads on %d CPUs\n", core_cnt);

    for (i = 1; i < core_cnt; i++) {
        corethread_create(i, &work, NULL);
    }

    work(NULL);

    int ret;
    for (i = 1; i < core_cnt; i++) {
        corethread_join(i, (void *)&ret);
    }

    printf("Threads joined\n");
    for (i = 0; i < core_cnt; i++) {
        printf(" start time: %llu, duration: %llu\n", t_start[i], t_end[i] - t_start[i]);
    }
    return 0;
}
  1. Run the following commands to compile and execute using patemu:
$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=1
$ patemu a.out 
Starting threads on 1 CPUs
Threads joined
 start time: 1000005, duration: 108301
$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=2
$ patemu a.out 
Starting threads on 2 CPUs
Threads joined
 start time: 1000000, duration: 92199
 start time: 1000000, duration: 92220
$ patmos-clang -O3 test.c libcorethread/corethread.c -DMAX_CPUS=4
$ patemu a.out 
Starting threads on 4 CPUs
Threads joined
 start time: 1000000, duration: 92199
 start time: 1000000, duration: 92220
 start time: 1000005, duration: 92236
 start time: 1000002, duration: 92260

For the last command, four CPUs execute the same code. The execution time for that code should be higher or at least equal compared to just one CPU executing the code. However, the execution time on a single CPU is significantly higher than one multiple CPUs. Note that the execution time does not vary as more CPUs are added, there is only a huge difference between one and multiple CPUs.

Disclaimer: I assume that the value returned by get_cpucnt() is equal when called simultaneously by two cores, please correct me if this is wrong.

Patmos emulator fails to build with Argo

According to section 1.2.8 of the Patmos Reference Handbook to enable a multi-core Patmos one should "uncomment following lines in the configuration file:"

<pipeline dual="false" />
<cores count="4"/>
<CmpDevs>
<CmpDev name="Argo" />
</CmpDevs>

However, if I do that, then building Patmos emulator fails. I first did a build with the original configuration to verify that everything works and then uncommented the lines as instructed in the configuration file. When rebuilding Patmos emulator I get following error:

[info] [0.003] Elaborating design...
IO device Timer: entity Timer, offset 2, params Map(), interrupts: List(0, 1), all cores
IO device Deadline: entity Deadline, offset 3, params Map(), all cores
IO device Sram16: entity SRamCtrl, offset -1, params Map(ocpAddrWidth -> 21, sramAddrWidth -> 20, sramDataWidth -> 16), core 0
IO device Leds: entity Leds, offset 9, params Map(ledCount -> 9), core 0
IO device Keys: entity Keys, offset 10, params Map(keyCount -> 4), interrupts: List(2, 3, 4, 5), core 0
Config core count: 4
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Reading /home/michael/Documents/t-crest-2021/patmos/hardware/../tmp/bootable-bootloader.bin
Config cmp: 
device: Argo
Argo connecting 4 Patmos islands with configuration:
N=2
M=2
SPM_SIZE (Bytes)=4096
Emulation is false
o--Instantiating Nodes
|---Node #0 @ (0,0)
|---Node #1 @ (0,1)
|---Node #2 @ (1,0)
|---Node #3 @ (1,1)
o--Building Interconnect
[error] java.io.IOException: error=2, No such file or directory
[error] 	...
[error] 	at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] 	at argo.Argo.<init>(Argo.scala:136)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] 	at chisel3.Module$.do_apply(Module.scala:54)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] 	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] 	at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] 	at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] 	at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] 	at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] 	at scala.collection.SetLike.map(SetLike.scala:105)
[error] 	at scala.collection.SetLike.map$(SetLike.scala:105)
[error] 	at scala.collection.AbstractSet.map(Set.scala:53)
[error] 	at patmos.Patmos.<init>(Patmos.scala:254)
[error] 	at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] 	... (Stack trace trimmed to user code only, rerun with --full-stacktrace if you wish to see the full stack trace)
[error] (run-main-0) firrtl.options.StageError: 
[error] firrtl.options.StageError: 
[error] 	at chisel3.stage.ChiselStage.run(ChiselStage.scala:60)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] 	at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at logger.Logger$.makeScope(Logger.scala:164)
[error] 	at firrtl.options.Stage.transform(Stage.scala:47)
[error] 	at firrtl.options.Stage.execute(Stage.scala:58)
[error] 	at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] 	at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] 	at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] 	at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] 	at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] 	at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] 	at scala.collection.immutable.List.foreach(List.scala:431)
[error] 	at scala.App.main(App.scala:80)
[error] 	at scala.App.main$(App.scala:78)
[error] 	at patmos.PatmosMain$.main(Patmos.scala:571)
[error] 	at patmos.PatmosMain.main(Patmos.scala)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] 	at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: chisel3.internal.ChiselException: Exception thrown when elaborating ChiselGeneratorAnnotation
[error] 	at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:65)
[error] 	at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] 	at scala.collection.immutable.List.flatMap(List.scala:366)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] 	at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] 	at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] 	at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] 	at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at logger.Logger$.makeScope(Logger.scala:164)
[error] 	at firrtl.options.Stage.transform(Stage.scala:47)
[error] 	at firrtl.options.Stage.execute(Stage.scala:58)
[error] 	at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] 	at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] 	at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] 	at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] 	at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] 	at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] 	at scala.collection.immutable.List.foreach(List.scala:431)
[error] 	at scala.App.main(App.scala:80)
[error] 	at scala.App.main$(App.scala:78)
[error] 	at patmos.PatmosMain$.main(Patmos.scala:571)
[error] 	at patmos.PatmosMain.main(Patmos.scala)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] 	at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.io.IOException: Cannot run program "../../local/bin/poseidon": error=2, No such file or directory
[error] 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
[error] 	at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
[error] 	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:104)
[error] 	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:118)
[error] 	at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] 	at argo.Argo.<init>(Argo.scala:136)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] 	at chisel3.Module$.do_apply(Module.scala:54)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] 	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] 	at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] 	at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] 	at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] 	at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] 	at scala.collection.SetLike.map(SetLike.scala:105)
[error] 	at scala.collection.SetLike.map$(SetLike.scala:105)
[error] 	at scala.collection.AbstractSet.map(Set.scala:53)
[error] 	at patmos.Patmos.<init>(Patmos.scala:254)
[error] 	at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] 	at chisel3.Module$.do_apply(Module.scala:54)
[error] 	at chisel3.stage.ChiselGeneratorAnnotation.$anonfun$elaborate$1(ChiselAnnotations.scala:60)
[error] 	at chisel3.internal.Builder$.$anonfun$build$1(Builder.scala:642)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at chisel3.internal.Builder$.build(Builder.scala:639)
[error] 	at chisel3.internal.Builder$.build(Builder.scala:635)
[error] 	at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:60)
[error] 	at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] 	at scala.collection.immutable.List.flatMap(List.scala:366)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] 	at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] 	at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] 	at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] 	at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at logger.Logger$.makeScope(Logger.scala:164)
[error] 	at firrtl.options.Stage.transform(Stage.scala:47)
[error] 	at firrtl.options.Stage.execute(Stage.scala:58)
[error] 	at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] 	at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] 	at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] 	at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] 	at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] 	at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] 	at scala.collection.immutable.List.foreach(List.scala:431)
[error] 	at scala.App.main(App.scala:80)
[error] 	at scala.App.main$(App.scala:78)
[error] 	at patmos.PatmosMain$.main(Patmos.scala:571)
[error] 	at patmos.PatmosMain.main(Patmos.scala)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] 	at java.lang.reflect.Method.invoke(Method.java:498)
[error] Caused by: java.io.IOException: error=2, No such file or directory
[error] 	at java.lang.UNIXProcess.forkAndExec(Native Method)
[error] 	at java.lang.UNIXProcess.<init>(UNIXProcess.java:247)
[error] 	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
[error] 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
[error] 	at scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
[error] 	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:104)
[error] 	at scala.sys.process.ProcessBuilderImpl$AbstractBuilder.$bang(ProcessBuilderImpl.scala:118)
[error] 	at argo.ArgoConfig$.genPoseidonSched(ArgoConfig.scala:157)
[error] 	at argo.Argo.<init>(Argo.scala:136)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$2(Patmos.scala:257)
[error] 	at chisel3.Module$.do_apply(Module.scala:54)
[error] 	at patmos.Patmos.$anonfun$cmpdevios$1(Patmos.scala:257)
[error] 	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:285)
[error] 	at scala.collection.immutable.Set$Set3.foreach(Set.scala:233)
[error] 	at scala.collection.TraversableLike.map(TraversableLike.scala:285)
[error] 	at scala.collection.TraversableLike.map$(TraversableLike.scala:278)
[error] 	at scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:53)
[error] 	at scala.collection.SetLike.map(SetLike.scala:105)
[error] 	at scala.collection.SetLike.map$(SetLike.scala:105)
[error] 	at scala.collection.AbstractSet.map(Set.scala:53)
[error] 	at patmos.Patmos.<init>(Patmos.scala:254)
[error] 	at patmos.PatmosMain$.$anonfun$new$118(Patmos.scala:580)
[error] 	at chisel3.Module$.do_apply(Module.scala:54)
[error] 	at chisel3.stage.ChiselGeneratorAnnotation.$anonfun$elaborate$1(ChiselAnnotations.scala:60)
[error] 	at chisel3.internal.Builder$.$anonfun$build$1(Builder.scala:642)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at chisel3.internal.Builder$.build(Builder.scala:639)
[error] 	at chisel3.internal.Builder$.build(Builder.scala:635)
[error] 	at chisel3.stage.ChiselGeneratorAnnotation.elaborate(ChiselAnnotations.scala:60)
[error] 	at chisel3.stage.phases.Elaborate.$anonfun$transform$1(Elaborate.scala:24)
[error] 	at scala.collection.immutable.List.flatMap(List.scala:366)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:23)
[error] 	at chisel3.stage.phases.Elaborate.transform(Elaborate.scala:16)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.DependencyManager.$anonfun$transform$3(DependencyManager.scala:278)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.DependencyManager.transform(DependencyManager.scala:269)
[error] 	at firrtl.options.DependencyManager.transform$(DependencyManager.scala:255)
[error] 	at firrtl.options.PhaseManager.transform(DependencyManager.scala:436)
[error] 	at chisel3.stage.ChiselStage.run(ChiselStage.scala:46)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.Stage$$anon$1.transform(Stage.scala:43)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:38)
[error] 	at firrtl.options.phases.DeletedWrapper.internalTransform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Translator.transform(Phase.scala:248)
[error] 	at firrtl.options.Translator.transform$(Phase.scala:248)
[error] 	at firrtl.options.phases.DeletedWrapper.transform(DeletedWrapper.scala:15)
[error] 	at firrtl.options.Stage.$anonfun$transform$5(Stage.scala:47)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
[error] 	at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
[error] 	at scala.collection.immutable.List.foldLeft(List.scala:91)
[error] 	at firrtl.options.Stage.$anonfun$transform$3(Stage.scala:47)
[error] 	at logger.Logger$.$anonfun$makeScope$2(Logger.scala:166)
[error] 	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
[error] 	at logger.Logger$.makeScope(Logger.scala:164)
[error] 	at firrtl.options.Stage.transform(Stage.scala:47)
[error] 	at firrtl.options.Stage.execute(Stage.scala:58)
[error] 	at chisel3.stage.ChiselStage.emitVerilog(ChiselStage.scala:117)
[error] 	at patmos.PatmosMain$.delayedEndpoint$patmos$PatmosMain$1(Patmos.scala:580)
[error] 	at patmos.PatmosMain$delayedInit$body.apply(Patmos.scala:571)
[error] 	at scala.Function0.apply$mcV$sp(Function0.scala:39)
[error] 	at scala.Function0.apply$mcV$sp$(Function0.scala:39)
[error] 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
[error] 	at scala.App.$anonfun$main$1$adapted(App.scala:80)
[error] 	at scala.collection.immutable.List.foreach(List.scala:431)
[error] 	at scala.App.main(App.scala:80)
[error] 	at scala.App.main$(App.scala:78)
[error] 	at patmos.PatmosMain$.main(Patmos.scala:571)
[error] 	at patmos.PatmosMain.main(Patmos.scala)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] 	at java.lang.reflect.Method.invoke(Method.java:498)
[error] Nonzero exit code: 1
[error] (Compile / runMain) Nonzero exit code: 1
[error] Total time: 6 s, completed Nov 4, 2021 5:05:46 PM

Handbook: ABI inconsistency around r20

The handbook defines the register calling convention.

In section "4.2 Register Usage Conventions" it says:

  • r20 through r31 are callee-saved saved registers.

However, if we look in section "2.3 Register Files" figure 2.2(a) r20 is given the label (scratch) as if it is caller-saved.
It clearly cannot be both.

Looking at the compiler, I can see that PatmosRegisterInfo.cpp defines Patmos::R21 through Patmos::R28 as calleesaved (in the function PatmosRegisterInfo::getCalleeSavedRegs).

Therefore, I suspect section 4.2 is wrong and should have r21 as the first callee-saved register.

XML parameter config for onboard memory issue

In Config.scala (line 244) we now force that in every ExtMem device there must be an sramAddrWidth param:

     if (!(ExtMemNode \ "@DevTypeRef").isEmpty){
       ExtMemDev = devFromXML(ExtMemNode,DevList,false)
       ExtMemAddrWidth = ExtMemDev.params("sramAddrWidth")
     }

However, if an onboard memory is used it creates the awkward situation of needing double address width parameters, otherwise the build of course fails with a key not found error:

  <Dev DevType="OCRam" entity="OCRamCtrl" iface="OcpBurst">
     <params>
        <param name="addrWidth" value="19" />
        <param name="sramAddrWidth" value="19" />
     </params>
   </Dev>

Hard coded IO constants

Change all hardcoded constants in the C files to use the defined ones from machine/patmos.h

Patmos emulator fails to build with direct-mapped instruction cache

Patmos emulator fails to build when the method cache is replaced by a direct-mapped instruction cache in hardware/config/default.xml:

-  <ICache type="method" size="8k" assoc="16" repl="fifo" />
+  <ICache type="line" size="8k" assoc="1" repl="dm" />

Several exceptions are raised during the build process:

Patmos configuration "default configuration for DE2-115 board"
	Frequency: 80 MHz
	Pipelines: 2
	Cores: 1
	Instruction cache: 8 KB, direct-mapped
	Data cache: 4 KB, direct-mapped, write through
	Stack cache: 2 KB
	Instruction SPM: 1 KB
	Data SPM: 2 KB
	Addressable external memory: 2 MB
	MMU: false
	Burst length: 4

[info] [2.097] Done elaborating.
[error] (run-main-0) firrtl.passes.PassExceptions: 
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 103:20]: [module ICache]  Expression ctrl.io.ctrlrepl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 106:20]: [module ICache]  Expression ctrl.io.ocp_port is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 106:20]: [module ICache]  Expression io.ocp_port is used as a SourceFlow but can only be used as a SinkFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 107:16]: [module ICache]  Expression ctrl.io.perf is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 111:20]: [module ICache]  Expression repl.io.icachefe is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 112:20]: [module ICache]  Expression repl.io.replctrl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 114:17]: [module ICache]  Expression repl.io.memIn is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.PassException: 7 errors detected!
[error] firrtl.passes.PassExceptions: 
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 103:20]: [module ICache]  Expression ctrl.io.ctrlrepl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 106:20]: [module ICache]  Expression ctrl.io.ocp_port is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 106:20]: [module ICache]  Expression io.ocp_port is used as a SourceFlow but can only be used as a SinkFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 107:16]: [module ICache]  Expression ctrl.io.perf is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 111:20]: [module ICache]  Expression repl.io.icachefe is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 112:20]: [module ICache]  Expression repl.io.replctrl is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.CheckFlows$WrongFlow:  @[ICache.scala 114:17]: [module ICache]  Expression repl.io.memIn is used as a SinkFlow but can only be used as a SourceFlow.
[error] firrtl.passes.PassException: 7 errors detected!
[error] Nonzero exit code: 1
[error] (Compile / runMain) Nonzero exit code: 1
[error] Total time: 8 s, completed Dec 22, 2021 3:46:38 PM

ISA change proposal: Allow branches in the second issue slot

This issue will track the discussion into changing the Patmos ISA to allow branches in the second issue slot.

Motivation

The limitation of not allowing branch instructions in the second issue slot is slightly "artificial" in that there are no technical reasons for it (as there is for load instructions.)
However, allowing branches in the second issue slot can provide significant benefit. One example is string copying, where we can use it in an implementation that only uses 3 cycles per character (amortized):

         pmov   $p1 = $p0
loop:
{  ($p1) lbuc   $r1 = [$r2]     ; Load next char
   ($p1) br     loop           }; Loop if not done
{  ($p1) add    $r2 = $r2, 1    ; Increment source pointer
   ($p1) add    $r3 = $r3, 1   }; Increment target pointer
{  ($p1) sbc    [$r3-1] = $r1   ; Save char
         cmpneq $p1 = $r1, 0   }; Check whether null was reached

Note: There might be a better way to do this by loading 4 chars at a time, but this is the fastest way to do it while loading only 1 char at a time.

Todo:

  • Implement in simulator
  • Benchmark on simulator

Inconsistent forwarding

The test case vliw_test/add (and possibly other test cases) show inconsistent forwarding in the simulator. In the first cycle, r1 is assigned 2 in pipeline 0 and 5 in pipeline 1. r2-r7 are assigned r1+r1 in the subsequent cycles. The result is that r1=5 (result from pipeline 1), r2=r3=4 (apparently forwarded from pipeline 0), and r4=r5=r6=r7=10 (forwarded from pipeline 1 or the register file). This should be fixed such that the value in the register and the forwarded values are consistent.

pasim: doesn't take predicates into account when complaining about use of load without delay slot

The pasim simulator does not take predicates into account when it throws an error about use of load value without a delay slot.

Take this assembly program:

	.file	"main.bc"
	.text
	.globl	main
	.align	16
	.type	main,@function
	.fstart	main, .Ltmp0-main, 16
main:                                        # @main
# BB#0:                                      # %entry
        sres 8
        mfs $r9 = $s0
        sws [1] = $r9            # 4-byte Folded Spill
        sws [2] = $r26           # 4-byte Folded Spill
        li $r1 = main
        lwc $r2 = [$r1]      #Load of r2
  (!$p0)add $r1 = $r1, $r2   #Use of r2, throws error
        mov $r1 = $r0
        lws $r9 = [1]            # 4-byte Folded Reload
        ret	
        lws $r26 = [2]           # 4-byte Folded Reload
        mts $s0 = $r9
        sfree 8
.Ltmp0:
.Ltmp1:
	.size	main, .Ltmp1-main

Compiling and running the program with pasim will make it throw the following error:

Cycle 24502: Illegal instruction at 00052048<main + 0x24>: Use of load result without delay slot!
Stacktrace:
#0 0x52024 <main>(): $rsp 0x68 stack cache size 0x40
   at 0x52048 (base: 0x52024 <main>, offset: 0x24 <main + 0x24>)
#1 0x20124 <__start>(): $rsp 0x7fffffff stack cache size 0x0
   at 0x201cc (base: 0x20194 <__start:.LBB1_1:.Ltmp2 + 0x8>, offset: 0x38 <__start:.LBB1_1:.Ltmp2 + 0x40>)
#2 0x20084 <_start>(): $rsp 0x7fffffff stack cache size 0x0
   at 0x20110 (base: 0x20084 <.LBB0_0:_start>, offset: 0x8c <_start + 0x8c>)

The offending instruction is (!$p0)add $r1 = $r1, $r2. pasim's error does not point directly to it, but to the subsequent instruction. We can remove the error by introducing a nop between the load and the use:

	.file	"main.bc"
	.text
	.globl	main
	.align	16
	.type	main,@function
	.fstart	main, .Ltmp0-main, 16
main:                                        # @main
# BB#0:                                      # %entry
        sres 8
        mfs $r9 = $s0
        sws [1] = $r9            # 4-byte Folded Spill
        sws [2] = $r26           # 4-byte Folded Spill
        li $r1 = main
        lwc $r2 = [$r1]     #Load of r2
        nop
  (!$p0)add $r1 = $r1, $r2  #Use of r2, doesn't throw an error
        mov $r1 = $r0
        lws $r9 = [1]            # 4-byte Folded Reload
        ret	
        lws $r26 = [2]           # 4-byte Folded Reload
        mts $s0 = $r9
        sfree 8
.Ltmp0:
.Ltmp1:
	.size	main, .Ltmp1-main

Usually, this error is correct, because the load instructions have 1 load-to-use latency where the destination register does not have the loaded value. But, in our case, the use instruction is predicated, and we can see that it will never run. In this case we can even see it before running the program, since (!$p0) is always false. But, the predicate could also be run-time-dependent.
In my mind, pasim should check the value of the predicate before throwing this error, such that if the predicate evaluates to false, at run-time, the error is not thrown.

Am I totally off in my understanding of the instruction set and pasim or should this be fixed?

Patmos emulator hangs when starting a core thread on a CPU with index >= 8 in a 16-core config

Steps to reproduce:

  1. Edit patmos/hardware/config/altde2-115.xml and uncomment lines 7 and 8 and change the core count to 16 as follows to build a multicore system (note that Argo is not used since it produces an error, see #104 ):
  <!-- Default is single core -->
  <pipeline dual="false" />
  <cores count="16"/>
  <!-- <CmpDevs>
  <CmpDev name="Argo" />
  </CmpDevs> -->
  1. After rebuilding with misc/build.sh, save the following code to a file called test.c:
#include <stdio.h>
#include "libcorethread/corethread.h"

void work(void* arg) {
}

int main() {
    int i;
    for (i = 1; i < 16; i++) {
        corethread_create(i, &work, NULL);
        printf("started thread on core %d\n", i);
    }
    return 0;
}
  1. Compile and execute as follows:
$ patmos-clang test.c libcorethread/corethread.c
$ patemu a.out
started thread on core 1
started thread on core 2
started thread on core 3
started thread on core 4
started thread on core 5
started thread on core 6
started thread on core 7
^C

After starting threads on CPUs 1, 2, 3, 4, 5, 6, and 7, the emulator hangs indefinitely.

ethlib and ethernet demo

During setup of patmos and Ethernet I found a set of bugs:

  1. ethlib does not configure the initial RX Buffer Descriptor as empty, therefore the Ethernet MAC assumes the buffer is full and no package is written into the systems RAM.
  2. ethlib_demo UDP Checksum calculation blocks CPU: added line to skip broken packet
  3. eth_wr and eth_rd are dedicated for EMAC io device and not for EthMac. To make this more clear a disclaimer would be helpful.

The method cache is filled on every call and return, even if the target is already cached

If I interpret the OCP port signals correctly, it appears that Patmos is filling the method cache on every call and return instruction, regardless of whether the fetched method is already cached.

Steps to observe this behavior:

  1. Save the following test program as test.c:
#include <machine/spm.h>

int fib(int n) {
    if (n < 2) {
        return n;
    }
    return fib(n-1) + fib(n-2);
}

int main() {
    volatile _SPM int *led = (volatile _SPM int *) 0xF0090000;
    *led = 1;
    int res = fib(3);
    *led = 0;
    return 0;
}
  1. Compile, emulate and trace the program with the following commands:
patmos-clang test.c
patemu -v a.out
  1. Disassemble the generated binary and dump the content to a file. Locate the function fib (for me it starts at address 0x20bc4).
patmos-llvm-objdump -d a.out > dump.S
  1. Open the trace file Patmos.vcd with GTKWave or a similar program. The traces of interest are TOP/Patmos/Leds/ledReg (helps with locating the relevant section), TOP/Patmos/cores_0/icache/io_ocp_port_M_cmd and TOP/Patmos/cores_0/icache/io_ocp_port_M_Addr.

As can be seen in the screenshot below, shortly after ledReg is assigned 1, the instruction cache starts fetching addresses 0x20bc0 through 0x20c70, which corresponds to the function fib, as expected. However, after a brief break the cache starts fetching the same addresses again.

Screenshot from 2022-01-03 21-07-42

Zooming out, we can see that the instructions of fib are fetched 9 times in total, after which the instruction cache fetches the main function (address 0x20c84). Hence, the function fib is fetched every time it is called (5 times in total) and every time a recursive call returns to fib (4 times). This is redundant, the function is already in the cache when called recursively.

Screenshot from 2022-01-03 21-07-10

This appears to have a significant impact on the performance of Patmos. For algorithms repeatedly calling small leaf methods the execution time is reduced by up to an order of magnitude when inlining these calls in order to avoid redundant cache refills.

PC setting for Verilator

The setting of the PC to jump to an ELF file entry for Verilator is strange. We may also want to set the PC on our chip project. Maybe we can find a better solution that fits for both use cases.

Issue with board definitions

Most board definitions are now broken. One (besides other) issue is the multiple definitions of the UART in default.xml and in the other board.xmls.

ISA change proposal: allow predicate manipulation in second issue slot

This issue will track the discussion into changing the Patmos ISA to allow predicate manipulation instructions in the second issue slot.

Motivation

Like #73, there is no technical reason for disallowing predicate manipulation instructions in the second issue slot. Allowing them could bring benefit in many ways, however, for single-path code especially, which has many independent predicate manipulation instructions, this could bring significant improvements

Todo:

  • Implement in simulator
  • Benchmark on simulator

I2controller.scala undefined pin names

After #69, now running the make BOARD=altde2-115 gen synth fails giving the following output:

[info] Compiling 4 Scala sources to /home/patmos/t-crest/patmos/hardware/target/scala-2.11/classes ...
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:68:10: value sclClk is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error]         io.pins.sclClk := sclClk
[error]                 ^
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:69:12: value sdaClk is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error]    io.pins.sdaClk := sdaClk
[error]            ^
[error] /home/patmos/t-crest/patmos/hardware/src/main/scala/io/I2controller.scala:126:10: value busy is not a member of Chisel.Bundle{val sdaIn: Chisel.Bool; val sdaOut: Chisel.Bool; val sclOut: Chisel.Bool; val sclIn: Chisel.Bool; val i2cEn: Chisel.Bool}
[error]         io.pins.busy := busy
[error]                 ^
[error] three errors found
[error] (Compile / compileIncremental) Compilation failed

the first two errors seem to be a typo as the correct pin names look like they should :

io.pins.sclOut
io.pins.sdaOut

emulator has dead code

There is some dead code in the emulator harness (was around setting on-chip memory). Should be removed.

Issue when building patmos with verilator

Hello,

when I try to build patmos, it fails on emulator target with the following error message.

%Error: ../harnessConfig.vlt:2: syntax error, unexpected IDENTIFIER %Error: Exiting due to 1 error(s) %Error: Command Failed /usr/bin/verilator_bin --cc ../harnessConfig.vlt Patmos.v --top-module Patmos '+define+TOP_TYPE=VPatmos' --threads 1 -CFLAGS '-Wno-undefined-bool-conversion -O1 -DTOP_TYPE=VPatmos -DVL_USER_FINISH -include VPatmos.h' -Mdir /home/ahmed/t-crest/patmos/hardware/build --exe ../Patmos-harness.cpp -LDFLAGS -lelf --trace

OS: Ubuntu 18.04
verilator: 3.916

404 error during build

Build process does not finish successfully. Looks like some links for poseidon are broken:

**===== Processing 'poseidon' ===== 
 ===== Cloning from https://github.com/t-crest/poseidon.git ===== 
Cloning into '/home/bob/t-crest/poseidon'...
remote: Enumerating objects: 3765, done.
remote: Total 3765 (delta 0), reused 0 (delta 0), pack-reused 3765
Receiving objects: 100% (3765/3765), 18.73 MiB | 9.69 MiB/s, done.
Resolving deltas: 100% (2212/2212), done.
~/t-crest/poseidon ~/t-crest
#@-mkdir -p lib 2>&1
#@cd lib && svn checkout http://pugixml.googlecode.com/svn/tags/release-1.2 pugixml
git submodule init
--2021-11-24 12:15:58--  http://mirrors.dotsrc.org/apache//commons/cli/source/commons-cli-1.4-src.tar.gz
Resolving mirrors.dotsrc.org (mirrors.dotsrc.org)... 130.225.254.116, 2001:878:346::116
Connecting to mirrors.dotsrc.org (mirrors.dotsrc.org)|130.225.254.116|:80... connected.
HTTP request sent, awaiting response... Submodule 'lib/pugixml' (https://github.com/zeux/pugixml.git) registered for path 'lib/pugixml'
git submodule update
404 Not Found
2021-11-24 12:15:58 ERROR 404: Not Found.

make: *** [Makefile:67: .common-cli] Error 8
make: *** Waiting for unfinished jobs....
Cloning into '/home/bob/t-crest/poseidon/lib/pugixml'...
Submodule path 'lib/pugixml': checked out '937ac8116e4feac075701d80211c4cafdf673142'**

handbook: destination register is undefined during the load delay slot

In the patmos handbook under the description for typed loads it states:

The value of the destination register is undefined during this load delay slot.

This has some implications that I think we do not want.
Say we have some code that needs to load 2 values and add them to an existing value in a register. A naive implementation could look like:

lwc $r1 = [$r2]
lwc $r1 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r1

However, the above wording would render this wrong: at the third instruction the value of $r1 is undefined, since it's the destination register of the previous load. A correct implementation would have to be:

lwc $r1 = [$r2]
lwc $r5 = [$r3]
add $r4 = $r4, $r1
add $r4 = $r4, $r5

Which requires an additional register. $r1 is effectively unavailable for the second load. If we don't have enough registers available, a nop would be needed after the second load.

In my opinion, the value of the destination register should be unaffected by the load until after the delay slot. Then we can always reuse registers in successive loads. I also think this is the behavior most would expect.

I have already brought this to the attention of @schoeberl, so this issue is mostly to ensure we don't forget.
Also, I don't know how Patmos currently implements the loads or whether a change would be needed to conform to my proposal.

New pin names, broken top-level

Most top-level VHLD files are broken due to new pin names. Before fixing them we should go through all top-level and discuss which we keep.

Writing to UART using patmos emulator (patemu) does not seem to succeed correctly

The standard output does not seem to reflect the fact that the following program writes to the UART, when run through the Patmos emulator (patemu). It, however, does work fine with the simulator (pasim).

#include <machine/spm.h>

int main() {
	volatile _SPM int *uart_data = (volatile _SPM int *) 0xF0080004;	
	*uart_data = 'H';
	for(;;);
}

The loop is only there to make sure the program doesn't terminate the UART communication early.

The following two commands was used to run the program myhello.c as above in the emulator

$ make comp APP=myhello
$ patemu tmp/myhello.elf

assembler crash

it appears that some special symbols make the assembler crash (e.g., in an old version of fetch_double.s)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.