byuccl / bfat Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 1.0 783 KB

Bitstream Fault Analysis Tool

License: Apache License 2.0

Python 98.90% Tcl 1.10%

bfat's People

Contributors

Stargazers

Watchers

Forkers

makefpgaeasy

bfat's Issues

Extend support to HCLK and RIO Tiles

Add in bit identification and association for bits used in HCLK and RIO tiles so Fault Errors can be identified in these tiles.

Allow scripts to be run in any directory

Currently, bfat.py and other utility scripts only function correctly if run in the main bfat directory. We need to make it possible for these scripts run correctly from anywhere.

Extend Support to Ultrascale and Ultrascale+ Architectures

BFAT current only supports the part architectures included in the ProjectXRay database, but it would be nice to be able to run it on designs implemented on Ultrascale or Ultrascale+ architectures.

Add multithread processing functionality

BFAT would run much faster if it were able to schedule different analysis processes on different threads.

Switch VivadoQuery TCL interface to use TelNet instead of python subprocesses

This could improve the stability and functionality of the TCL based design query.

Support Net Tracing to identify PIPs and corresponding bits

In our latest BFAT analysis of the VexRiscv Linux system we found that the data lines on the DDR are the largest contributor to faults in our TMR system. It would be nice to "predict" what these bits are and actually count the sensitive bits associated with these nets. To do this, it would be nice to have the ability to trace nets from source to all sinks and return all the PIPs and bits associated with the net so we can get a feel for the sensitivity of the net. This would help us get a feel for how "long" the nets are in terms of CRAM sensitivity.

Full design sensitivity estimation

Expanding on issue #29, it would be nice if we could analyze the .dcp of a file and come up with a list of sensitive resources/bits from BFAT. This may not be fully comprehensive but it would allow us to estimate ahead of time the sensitivity of a design.

Review related works

The following paper is related and should be reviewed for contributions to bfat.

`find_fault_bits.py` doesn't work with designs that have the letter 'b' in them

Steps to reproduce:

Compile a design with a name that has the letter b in it
Run find_fault_bits.py

Example:

Design name is 'betrusted_soc'. Header looks like this:

00000000  00 09 0f f0 0f f0 0f f0  0f f0 00 00 01 61 00 2f  |.............a./|
00000010  62 65 74 72 75 73 74 65  64 5f 73 6f 63 3b 55 73  |betrusted_soc;Us|
00000020  65 72 49 44 3d 30 58 46  46 46 46 46 46 46 46 3b  |erID=0XFFFFFFFF;|
00000030  56 65 72 73 69 6f 6e 3d  32 30 32 30 2e 32 00 62  |Version=2020.2.b|
00000040  00 0c 37 73 35 30 63 73  67 61 33 32 34 00 63 00  |..7s50csga324.c.|
00000050  0b 32 30 32 32 2f 30 38  2f 30 32 00 64 00 09 31  |.2022/08/02.d..1|
00000060  34 3a 31 35 3a 33 38 00  65 00 21 72 8c ff ff ff  |4:15:38.e.!r....|
00000070  ff ff ff ff ff ff ff ff  ff ff ff ff ff ff ff ff  |................|
00000080  ff ff ff ff ff ff ff ff  ff ff ff ff ff 00 00 00  |................|
00000090  bb 11 22 00 44 ff ff ff  ff ff ff ff ff aa 99 55  |..".D..........U|

This line:

bfat/bitread.py

Line 98 in 42c4a78

if ord(byte) == 98:

Finds the letter b (ord value 98) in the design name and decides that is the beginning of the part number record, but it is actually the design name.

Workaround:

dd if=betrusted_soc.bit of=betrusted_soc_trunc.bit skip=30 bs=1

Will simply lop off the name of the design and allow the script to run on the resulting _trunc.bit file.

A more permanent solution might be to parse the bitstream to look for a more robust sentinel. I'm not so familiar with the .bit format to recommend what that would be, but maybe searching for the trailing and leading '00', so a sequence of [0x00, 0x62, 0x00], would be robust since the file name is terminated with a ; character and not a null. See also http://www.pldtool.com/pdf/fmt_xilinxbit.pdf. This sequence would work for any part number length that is shorter than 255 characters (a longer length would put a 0x01 after the 0x62), but I don't know of any Xilinx part numbers that are that long.

Update VivadoQuery handling of GND and VCC nets

GND and VCC nets change based on the design and need to be handled as such. Develop a way to automatically assess any potential GND/VCC nets and their connections. (Generally, GND/VCC nets end with some form of GND/VCC and may have many names but are all routed the same, but this could potentially be incorrect for some designs)

Provide a simple HDL example

Provide a simple HDL/XDC and its corresponding DCP as an example for users to run. With the source, the users can generate their own DCP to make sure they can run the tool on a simple example.

Support UltraScale devices

BFAT currently only supports 7 series device based o the Project Xray. It would be nice to support UltraScale device based on Project uray.

TMR Sensitivity Analysis

Extending Issue #30 , it would be nice to have the ability to analyze a TMR design and identify the single point failures here and ignore the sensitivity of the TMRd portions.

xc7a35t not supported

Because the xc7a35t is missing a tilegrid.json file in Project X-Ray, the part is not currently supported by BFAT.

Include more specific details in the fault report for some bits

For undefined bits, print out information about the possible tiles they could affect even though a specific resource and function could not be determined for the bit in the Project X-Ray database. If no possible tiles could be found, make that clear.

For some CLB functions (such as CLKINV) there is not cell mapped onto the BEL. We need to properly detect these bits and classify them correctly, making it clear that there isn't a design resource related to that BEL.

Provide more clear API for determining resource used by bit

As part of the fault analysis process, BFAT must determine the resources associated with a given CRAM bit. However, this determination is embedded in higher level fault analysis logic. It would be helpful to separate the functionality of the higher level fault analysis from the low-level CRAM bit resource identification. This way, users can apply BFAT for simplify identifying resources or even create other higher level functions based on this resource identification.

Can design query support format for F4PGA?

The top.route file from an F4PGA build folder may work in replacement of a Vivado DCP file.

CLB LUT Fault Modeling

It would be nice to model failures in the LUTs of a CLB. While the current approach clearly indicates that a LUT bit has been upset, it would be nice to see how the overall LUT changed. A simple approach would be to print the complete LUT contents before and after the LUT upset. This could later be expanded to evaluate how it impacts the logic function based on the number of inputs (for example, if one of the LUT inputs is hard coded to a value it is possible that the LUT upset is a "don't care"). Some logic evaluation could help with this understanding.

LICENSE file should be full text of the license

The general best practice is to make the LICENSE file the full text of the license and then putting a reference in the README and at the top of files. If you do this, then GitHub should detect this repository is available under an Apache 2.0 license.

You can see examples under https://github.com/chipsalliance and https://github.com/f4pga

Add support for RapidWright

Add support for using RapidWright instead of Vivado for design database querying.

Provide test script to test installation

Provide a test script that can be used to test the installation of bfat as well as the other tools. The script could be used as a start of a CI for automated testing.

Same inputs yield different results when the tool is run twice

I'm running Vivado 2022.1 and Python 3.9, and invoking the tool with this command:

python3.9 find_fault_bits.py betrusted_soc.bit betrusted_soc_route.dcp -r -d
Running off of commit 415fab1

If I run this command twice in a row, I get different outputs for e.g. betrusted_soc_sample_bits.json.

One run, for example, yields this:

[
    [
        [
            "00020aa0",
            "027",
            "31"
        ]
    ],
    [
        [
            "00020691",
            "051",
            "22"
        ]
    ],
    [
        [
            "00020694",
            "051",
            "22"
        ]
    ],
    [
        [
            "00020694",
            "052",
            "00"
        ]
    ],
    [
        [
            "0000002a",
            "000",
            "00"
        ]
    ],
    [
        [
            "00400120",
            "002",
            "15"
        ],
        [
            "00400215",
            "004",
            "07"
        ]
    ]
]

and the next will give this

[
    [
        [
            "00020ca2",
            "086",
            "31"
        ]
    ],
    [
        [
            "00020e15",
            "080",
            "09"
        ]
    ],
    [
        [
            "00000904",
            "093",
            "07"
        ]
    ],
    [
        [
            "0000090e",
            "094",
            "11"
        ]
    ],
    [
        [
            "0000002a",
            "000",
            "00"
        ]
    ],
    [
        [
            "00401320",
            "004",
            "15"
        ],
        [
            "00001515",
            "083",
            "07"
        ]
    ]
]

Here's what the diff of the two is:

4,5c4,5
<             "00020aa0",
<             "027",
---
>             "00020ca2",
>             "086",
11,13c11,13
<             "00020691",
<             "051",
<             "22"
---
>             "00020e15",
>             "080",
>             "09"
18,20c18,20
<             "00020694",
<             "051",
<             "22"
---
>             "00000904",
>             "093",
>             "07"
25,27c25,27
<             "00020694",
<             "052",
<             "00"
---
>             "0000090e",
>             "094",
>             "11"
39,40c39,40
<             "00400120",
<             "002",
---
>             "00401320",
>             "004",
44,45c44,45
<             "00400215",
<             "004",
---
>             "00001515",
>             "083",

Is this the expected outcome? My assumption is the tool would list all the problem bits and it would be deterministic, but perhaps I am not understanding the nature of the tool correctly.

To reproduce, you can copy these files (temporarily staged in a weird directory, will be removed eventually):

https://ci.betrusted.io/trng/betrusted_soc_route.dcp
https://ci.betrusted.io/trng/betrusted_soc.bit

Provide support for .ll and essential bits file parsing

Provide support for parsing .ll files and essential bits files so that users can easily identify potential bits that can be analyzed. Users could generate a list of bits to query from these files without doing fault injection or radiation testing.