tapasco issueshttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues2020-04-03T17:49:05Zhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/124Evaluate 64 bit for platform_addr_t2020-04-03T17:49:05ZJaco HofmannEvaluate 64 bit for platform_addr_tAll platforms except for legacy Zynq, such as the PCIe based systems or MPSoC, use larger than 32 bit addresses. While we currently get by with smaller addresses this might change in the future and we should consider a move to 64 bit add...All platforms except for legacy Zynq, such as the PCIe based systems or MPSoC, use larger than 32 bit addresses. While we currently get by with smaller addresses this might change in the future and we should consider a move to 64 bit addresses.
I currently don't see any problem just changing the address width. The Zynq platform should continue to work with the required casts and all other platforms currently cast to 64 bit addresses anyway.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/169Investigate Logic Utilization reports2020-04-03T12:24:05ZCarsten HeinzInvestigate Logic Utilization reportsIt seems that the utilization report does not make sense for BRAM in the user logic. Sometimes utilization for user logic is higher than for the complete system logic.It seems that the utilization report does not make sense for BRAM in the user logic. Sometimes utilization for user logic is higher than for the complete system logic.Carsten HeinzCarsten Heinzhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/167Interrupts go missing sometimes2020-04-03T11:43:05ZJaco HofmannInterrupts go missing sometimesThe PCIe MSIx interrupts coming from the DMA engine are received properly by the interrupt controller. The interrupt controller properly issues a AXI write request to the correct address in host memory. The PCIe AXI bridge does ACK the t...The PCIe MSIx interrupts coming from the DMA engine are received properly by the interrupt controller. The interrupt controller properly issues a AXI write request to the correct address in host memory. The PCIe AXI bridge does ACK the transfer and Bresp is OKAY. However, sometimes the interrupts do not reach the host for some reason. This can be confirmed checking /proc/interrupt.
This might be related to the interrupt controller taking too long. However, the DMA interrupt simply increases a value and schedules the userspace. This should not take too long.
Another alternative is that the PCIe bridge looses data when it is under heavy pressure.
For now as a quick fix I will try to disable a certain interrupt whenever the interrupt has just fired and see if that fixes the problem at the cost of latency. If that doesn't help maybe there is some possibility to remove protocol converters in between the interrupt handler and the PCIe bridge to avoid problems with those.
Overall no clear indication to what might go wrong as long as we don't have the hardware to debug right on the PCIe bus.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/165AXI Interconnect does not handle AXI4 -> AXI4 Lite properly for small transfers2020-03-18T17:12:05ZJaco HofmannAXI Interconnect does not handle AXI4 -> AXI4 Lite properly for small transfersIt seems like the AXI interconnect does not handle protocol conversion from AXI4 to AXI4-Lite properly and ignores the strb signal on reads. Accordingly, whenever a request comes e.g. through PCIe that is larger than the AXI4-Lite slave ...It seems like the AXI interconnect does not handle protocol conversion from AXI4 to AXI4-Lite properly and ignores the strb signal on reads. Accordingly, whenever a request comes e.g. through PCIe that is larger than the AXI4-Lite slave data width it will result in superfluous transactions. That's not a big deal for writes as the strb signal is set properly. However, for reads there is no such signal in AXI4-Lite and if the read has some effect on the state of the device it will result in hard to debug problems. This is known to Xilinx but seems to be wont-fix: https://forums.xilinx.com/t5/Embedded-Development-Tools/AXI4-gt-AXI-Lite-wstrb-behavior/td-p/645535https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/162LED feature on VC709 crashes Vivado2020-03-04T22:47:08ZJens KorinthLED feature on VC709 crashes VivadoEnabling the LED feature on VC709 compositions reproducibly crashes Vivado. While this is certainly a Vivado bug, we should investigate a workaround.Enabling the LED feature on VC709 compositions reproducibly crashes Vivado. While this is certainly a Vivado bug, we should investigate a workaround.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/160Local memory slots not considered in area estimation, causing DSE to fail2019-12-18T09:54:11ZJens KorinthLocal memory slots not considered in area estimation, causing DSE to failIf a PE has local memories (or more than one slave interface, for that matter), DSE will still try to build more instances than will fit in the current 128 slots limit. There are several possible solutions:
1. Have separate enumerati...If a PE has local memories (or more than one slave interface, for that matter), DSE will still try to build more instances than will fit in the current 128 slots limit. There are several possible solutions:
1. Have separate enumeration for memory slots (affects status core, `platform_info` and potentially requires a more sophisticated way to determine accessibility for each PE).
2. Fix the algorithms to account for each slave interface instead of just assuming one.
Need to think about it some more; I guess, each PE will always have exactly _one_ control slave interface. We could require a naming convention to identify it if more than one candidate is present on a PE, e.g., `S_AXI_CTRL` or similar. All other slave interfaces could be assigned a base address from a different pool, e.g., using the upper 64 base addresses already reserved for platform addresses. But we'd have to come up with some O(k) or at least O(n) scheme to find the base addresses of all slaves on a PE. :thinking:2018.2https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/137TaPaSCo is stuck after all jobs finished in verbose mode2019-10-30T15:34:08ZJens KorinthTaPaSCo is stuck after all jobs finished in verbose modeWhen verbose-mode is activated (`-v`) and logs are tracked, the `MultiFileWatcher`s prevent TaPaSCo from exiting normally. Check that all watchers are properly terminated after their corresponding job has ended.When verbose-mode is activated (`-v`) and logs are tracked, the `MultiFileWatcher`s prevent TaPaSCo from exiting normally. Check that all watchers are properly terminated after their corresponding job has ended.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/129Fallback option of requested amount of MSI-X interrupts is not available2019-10-23T11:56:06ZJaco HofmannFallback option of requested amount of MSI-X interrupts is not availableThe driver currently simply fails if the OS is not able/willing to provide the requested number of interrupts.
There should be a fall back option that gets enabled automatically if the requested amount of interrupts can not be provided.The driver currently simply fails if the OS is not able/willing to provide the requested number of interrupts.
There should be a fall back option that gets enabled automatically if the requested amount of interrupts can not be provided.Jaco HofmannJaco Hofmannhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/127[ZCU102] Evaluate is very slow2019-10-16T13:46:04ZJaco Hofmann[ZCU102] Evaluate is very slowWhen running evaluate on a small core (~6000 LUTs) the process takes about 3 to 4 minutes for the 7-Series devices. When run on the ZCU102 Zynq Ultrascale+ device the same process requires over an hour.
For example Phase 3 Initial Routi...When running evaluate on a small core (~6000 LUTs) the process takes about 3 to 4 minutes for the 7-Series devices. When run on the ZCU102 Zynq Ultrascale+ device the same process requires over an hour.
For example Phase 3 Initial Routing requires more than 40 Minutes instead of about a minute for the other platforms.
Testing was done with Vivado 2016.4.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/107[VC709] Seperate addresses of different memory regions2019-07-09T11:03:05ZJaco Hofmann[VC709] Seperate addresses of different memory regionsCurrently TPC has different devices at the same address depending on the viewpoint. For example the TPC configuration registers start at 0x0 which is visible from the host. The on-board DDR memory is also located at 0x0 but only visible ...Currently TPC has different devices at the same address depending on the viewpoint. For example the TPC configuration registers start at 0x0 which is visible from the host. The on-board DDR memory is also located at 0x0 but only visible by the DMA engine and the PEs. It might be advisable to split these memory regions. A new address map could look like
| Address | Device |
| --- | --- |
| 0x0001000000000000 | MIG |
| 0x0002000000000000 | Configuration |
| 0x0003000000000000 | PEs |
etc. Accordingly Configuration and PEs would be separated into different BARs.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/93BlueDMA support in ZC7062019-06-26T06:28:06ZJens KorinthBlueDMA support in ZC706ZC706 could benefit from an DMA engine feature, which allows to use the on-board DDR banks. Port BlueDMA to Zynq and implement a Platform `Feature` for it.ZC706 could benefit from an DMA engine feature, which allows to use the on-board DDR banks. Port BlueDMA to Zynq and implement a Platform `Feature` for it.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/174Compose fails after HLS runs2019-01-22T16:31:55ZJaco HofmannCompose fails after HLS runsSometimes a compose job fails after successful HLS runs with the following error:
```bash
[16:22:41 <pool-1-thread-2: ImportTask> INFO] Import of 'arrayinit_axi4mm.zip' with target axi4mm@vc709
[16:22:41 <pool-1-thread-2: Import$> INFO]...Sometimes a compose job fails after successful HLS runs with the following error:
```bash
[16:22:41 <pool-1-thread-2: ImportTask> INFO] Import of 'arrayinit_axi4mm.zip' with target axi4mm@vc709
[16:22:41 <pool-1-thread-2: Import$> INFO] SynthesisReport for arrayinit not found, starting evaluation ...
[16:22:41 <pool-1-thread-2: EvaluateIP$> INFO] starting evaluation of /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arrayinit/axi4mm/vc709/ipcore/arrayinit_axi4mm.zip for xc7vx690tffg1761-2@1000,000 MHz, output in /tmp/372075065893313964/evaluate.log
[16:30:38 <pool-1-thread-2: EvaluateIP$> INFO] evaluation of /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arrayinit/axi4mm/vc709/ipcore/arrayinit_axi4mm.zip for xc7vx690tffg1761-2@1000,000 MHz finished successfully, report in /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arrayinit/axi4mm/vc709/ipcore/arrayinit_export.xml
[16:30:38 <pool-1-thread-3: VivadoHighLevelSynthesis$> INFO] starting run 'arraysum' for axi4mm@vc709: output in /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arraysum/axi4mm/vc709/hls/axi4mm.log
[16:31:16 <pool-1-thread-3: VivadoHighLevelSynthesis$> INFO] Vivado HLS finished successfully for 'arraysum' for axi4mm@vc709
[16:31:16 <main: HighLevelSynthesis$> INFO] all HLS tasks have finished.
[16:31:16 <main: HighLevelSynthesis$> WARN] executed HLS with co-sim for [Kernel @/home/wimi/jah/projects/tapasco/tapasco_2018.2/kernel/arraysum/kernel.json]
Name = arraysum
TopFunction = arraysum
Version = 1.0
Files = /home/wimi/jah/projects/tapasco/tapasco_2018.2/kernel/arraysum/arraysum.c
TestbenchFiles = /home/wimi/jah/projects/tapasco/tapasco_2018.2/kernel/arraysum/arraysum-tb.c
CompilerFlags =
TestbenchCompilerFlags =
Args = arr by reference
OtherDirectives = None, but no co-simulation report was found
[16:31:16 <pool-1-thread-2: ImportTask> INFO] Import of 'arraysum_axi4mm.zip' with target axi4mm@vc709
[16:31:16 <pool-1-thread-2: Import$> INFO] SynthesisReport for arraysum not found, starting evaluation ...
[16:31:16 <pool-1-thread-2: EvaluateIP$> INFO] starting evaluation of /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arraysum/axi4mm/vc709/ipcore/arraysum_axi4mm.zip for xc7vx690tffg1761-2@1000,000 MHz, output in /tmp/9791558545089762559/evaluate.log
[16:39:08 <pool-1-thread-2: EvaluateIP$> INFO] evaluation of /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arraysum/axi4mm/vc709/ipcore/arraysum_axi4mm.zip for xc7vx690tffg1761-2@1000,000 MHz finished successfully, report in /home/wimi/jah/projects/tapasco/tapasco_2018.2/core/arraysum/axi4mm/vc709/ipcore/arraysum_export.xml
[16:39:08 <main: Compose$> INFO] all HLS tasks finished successfully, beginning compose run...
[16:39:08 <pool-1-thread-4: ComposeTask> ERROR] java.lang.Exception: could not find all required cores for target axi4mm@vc709, missing: arrayinit, arraysum
```Lukas SommerLukas Sommerhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/166Add PE to interrupt mapping in Status Core2019-01-22T16:27:20ZJaco HofmannAdd PE to interrupt mapping in Status CoreInterrupts are currently mapped iterative to the corresponding interrupt line. To increase flexibility the status core can store the mapping used.
Advantages are flexible mappings that enable the use of more than one interrupt per PE.Interrupts are currently mapped iterative to the corresponding interrupt line. To increase flexibility the status core can store the mapping used.
Advantages are flexible mappings that enable the use of more than one interrupt per PE.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/164Properly unregister the driver if a device is removed2019-01-22T16:25:45ZJaco HofmannProperly unregister the driver if a device is removedFix errors occurring because the tlkm driver does not react properly on device remove requests.Fix errors occurring because the tlkm driver does not react properly on device remove requests.2018.2Jaco HofmannJaco Hofmannhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/163Implement tapasco_load_bitstream* functions2019-01-22T16:25:12ZJens KorinthImplement tapasco_load_bitstream* functionsSince its inception, the TaPaSCo/TPC API had two functions to load a new bitstream at runtime. This is meant to support complex use cases where an application switches between multiple bitstreams optimized for the specific stage of compu...Since its inception, the TaPaSCo/TPC API had two functions to load a new bitstream at runtime. This is meant to support complex use cases where an application switches between multiple bitstreams optimized for the specific stage of computation. This is arguably a useful thing and reasonably simple to implement on Zynq (given appropriate permissions on `/dev/xdevcfg`).
Is there a way to implement similar support on PCIe devices with reasonable effort? I suppose it would involve an ICAP as a platform component; however, I'm not sure if this works with non-partial bitstreams.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/161Allow PE Masters to have any valid AXI Data Width2019-01-22T16:23:47ZJaco HofmannAllow PE Masters to have any valid AXI Data WidthThe data width of PE masters is currently limited to either 32 or 64 bit. Considering that most platforms outside of Zynq have much broader memory controllers it is beneficial to support all valid AXI Data Widths up to 1024 bits. This mi...The data width of PE masters is currently limited to either 32 or 64 bit. Considering that most platforms outside of Zynq have much broader memory controllers it is beneficial to support all valid AXI Data Widths up to 1024 bits. This might also be relevant for Zynq platforms if the designer of a PE wants to keep their logic simple and rely on data width converters to interface with the memories correctly.Jens KorinthJens Korinthhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/154Allow direct view of the device memory on PCIe2019-01-22T16:15:25ZJaco HofmannAllow direct view of the device memory on PCIeThis can be implemented by using a sliding window and a second BAR. The Xilinx Core does not support this feature directly, though. Will use a little Bluespec Module that has one configuration register for the address offset which forwar...This can be implemented by using a sliding window and a second BAR. The Xilinx Core does not support this feature directly, though. Will use a little Bluespec Module that has one configuration register for the address offset which forwards the requests accordingly.Jaco HofmannJaco Hofmannhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/144Support PE-local memories in HLS2019-01-22T16:12:26ZJens KorinthSupport PE-local memories in HLSUse new PE-local memory support to enable a new kind of HLS port pattern: `localmem`. A Tcl script should automatically wrap the PE with BRAM and also make the BRAM accessible via secondary S-AXI. Using the new PE-local memories, it shou...Use new PE-local memory support to enable a new kind of HLS port pattern: `localmem`. A Tcl script should automatically wrap the PE with BRAM and also make the BRAM accessible via secondary S-AXI. Using the new PE-local memories, it should be possible to use BRAMs for HLS-based kernels, e.g., AES.Jens KorinthJens Korinthhttps://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/103Make synthesis and implementation effort configurable2019-01-22T16:05:11ZJaco HofmannMake synthesis and implementation effort configurableThe default settings used at the moment are AlternateRoutability + Retiming for Synthesis and Explore + PHYS_OPT_DESIGN for Implementation. These settings could be considered to be very high effort. A switch could be added to let the use...The default settings used at the moment are AlternateRoutability + Retiming for Synthesis and Explore + PHYS_OPT_DESIGN for Implementation. These settings could be considered to be very high effort. A switch could be added to let the user decide between different "effort levels". For most synthesis runs it is not necessary to go with very high effort and the user might be happy about the much lower run-time.https://git.esa.informatik.tu-darmstadt.de/tapasco/tapasco/-/issues/106TPC-Debug: Monitor Device Registers looks for INTC02019-01-22T16:04:27ZJaco HofmannTPC-Debug: Monitor Device Registers looks for INTC0With the changes to MSIx there is only one Interrupt Controller left and that one does not expose any status registers right now so the red warning: "INTC0: 0xffffffff" might be misleading.With the changes to MSIx there is only one Interrupt Controller left and that one does not expose any status registers right now so the red warning: "INTC0: 0xffffffff" might be misleading.Jens KorinthJens Korinth