The ot-sca from lowrisc

[tvla] Usability and performance improvements, code cleanup

With #71 we have moved our TVLA code to the public repo. Besides adding new features, we have so far identified the following things to improve:

Use typer (as in capture.py) to simplify the handling of input arguments.
Allow specifying a command line argument to configure the logger, see #71 (comment)
Refactor the main function to simplify it, e.g., by moving chunks of code into helper functions.
Parallelize computation using Ray (see ceca.py).
Provide yaml config files to configure specific modes of the TVLA code.
Reduce number of command line arguments.
Add some form of CI support?
Change naming of input arguments to make them more intuitive.
Store keys and plaintexts in the trace-file. Currently we need to open project file even when we are using traces from the trace file.
Allow for analysis of only a part of the trace (e.g. by specifying start_sample and end_sample)
Figure layouts for specific and general tests should be the same.
Allow specifying the range of points in trace to be shown in t_value vs number of samples graphs. Currently constant range is used.
TVLA configuration file for aes tvla_cfg_aes.yaml should be added.
Two arguments, namely mode and general_test are used to specify the type of test. This should be reworked to use only one argument with the value specifying either the general test or the type of the specific test.
An output file tmp/ttest.npy containing the analysis results is generated if and only if cfg["input_file"] is provided. This should be a separate option because it is not related to the input file.
Allow ttest-step.npy to be saved whenever n_steps != 0, regardless of the other input parameters
Make separate input arguments for save_to_disk_trace and save_to_disk_leakage
Make trace filtering controllable from the command line / cfg file

Add support for CW-Husky

Our capture scripts still use the CW-Lite scope. We should switch to CW-Husky which has higher capture performance.

[ecdsa] Implement golden model

Similar to AES, KMAC and SHA3 we should have a golden model running in parallel with the target to verify the produced signature on the host side.

Implement Correlation-Enhanced Power Analysis Collision Attack

Evaluate and consider implementation of the correlation-enhanced power analysis collision attack formulated here: https://eprint.iacr.org/2010/297.pdf

[capture] CW310 / Husky power measurements, level toggles during capture

When capturing traces on a CW310 / Husky setup we get traces on two different levels seemingly at random. Tested using: ./capture.py --cfg-file capture_aes_cw310.yaml capture aes-random --num-traces 100 --plot-traces 100. Traces distributions are on those two levels:

cc @vogelpi @vrozic @nasahlpa

Automatic testing of tvla.py script

With the PR #95, we can now load histograms and use them for general test. This is done automatically when a PR is created but it does not cover all functionality of the tvla.py script. We should also add an automatic test which gets power measurements, computes their t-values and compares with the expected values.

In addition there may be some other functionalities of the tvla.py script that may be tested. Maybe @vrozic, @vogelpi and @andreaskurth has some ideas?

[capture] Change device-side trigger; remove FPGA-specific HW modification

Currently, the device SW sets the trigger HI/LO and in HW this is AND-ed with the AES !IDLE signal. This generates tight trigger HI/LO envelopes around the AES computation. For the longer term, we need to change this behaviour and have the trigger purely controlled by SW. TODO: Change this and test.

[ci] Automatic comparison of traces

Currently, traces generated in the CI need to be manually inspected. Ideally, an automated comparison of generated traces to a golden model is conducted.

ktp.next() for keys

As discussed in #115 and https://docs.google.com/document/d/1sZjASiUji_IT-t9jEFlK-kGc_Gw3mGfubNvpkM_IQT4/edit#heading=h.qd65x0y76b0e we should check the usage of key, text = ktp.next() in cw/capture.py

Default behavior is that ktp.next() returns const_key, random_plaintext. This default behavior can be changed with ctx.obj.ktp.fixed_key = False.

Afaik, we call ktp.next()[0] only once per function and use only the first call of the iterator to assign a constant key. For random keys we use ktp.next()[1]. Thus, I think we ware fine and can change the default behavior, but we should double check that and read the spec if the first call of the iterator still outputs a deterministic value. On the other hand, if we really rely on a constant non random value, we should use a constant and not a constant iterator.

board variable should be an enum

The variable to distinguish the board (CW310 vs. CW305) is currently just a string, but this should be an enum. See comment.

Implement automation to upload/download traces and bitstreams to/from GCP storage buckets

Implement a mechanism to upload and download bitstreams and traces to/from gcp buckets. Additional requirements:

Store metadata associated with bitstream to be able to reproduce the build. For example, git commit hash + patches.
Store metadata associated with the traces to be able to reproduce the capture:
- Capture device and associated settings.
- Bitstream and test firmware.
- Any additional data required to recreate the physical setup.

Report the number of correctly guessed differences in ceca.py.

See this comment in #28.

Instrument TVLA test cases for AES

Integrate TVLA testing for the AES core using ChipWhisperer's test suite as documented here: https://cwtvla.readthedocs.io/en/latest/

scope.clock.clkgen_freq vs pll_frequency in batch modes

While writing the code for the otbn capture and helping debugging the otbn-batch mode, @wettermo and I stumbled across the following lines

ot-sca/cw/cw_segmented.py

Lines 175 to 187 in 111c140

    
           def _configure_scope(self, scope_gain, offset): 
        
               self._scope.gain.db = scope_gain 
        
               if offset >= 0: 
        
                   self._scope.adc.offset = offset 
        
               else: 
        
                   self._scope.adc.offset = 0 
        
                   self._scope.adc.presamples = -offset 
        
               self._scope.adc.basic_mode = "rising_edge" 
        
               if self._is_husky: 
        
                   # We sample using the target clock * 2 (200 MHz). 
        
                   self._scope.clock.adc_mul = 2 
        
                   self._scope.clock.clkgen_freq = 100000000 
        
                   self._scope.clock.clkgen_src = 'extclk'

Can someone explain to me, why self._scope.clock.clkgen_freq = 100000000 is hardcoded there? Is this just a leftover or is there a particular reason? I'd expected thatt, self._scope.clock.clkgen_freq = pll_frequency like in

ot-sca/cw/util/device.py

Line 130 in 111c140

scope.clock.clkgen_freq = pll_frequency

While writing, I just discovered that @bilgiday had changed this behavior in cw/util/device.py in f51c6ec

So, may I assume, we could change this also in the cw_segmented.py, or is this a bad idea for some reason?

[CW310] Fix detection whether bitstream has been programmed previously

On the CW310, the detection whether a bitstream has been programmed previously doesn't work. The bitstream is always programmed which takes around 15 seconds due to the bigger FPGA.

Implement trace conversion scripts to support Riscure's trs format

Riscure has made a trs library available here: https://github.com/Riscure/python-trsfile. Implement a script to interoperate with ChipWhisperer traces.

Replace kmac128 from pyXKCP with pycryptodome

In #54 we added support for KMAC using pyXKCP, which uses some vendor c-sources. We did that, because at this point in time there was no good python kmac implementation available.
However, this adds some c-sources and some cffi scripting.

In the meantime, kmac128 and kmac256 were added to pycryptodome v3.12.0

Do we want to streamline our python flow? This has of course a very low priority, but could be a good first issue for some else.

[ecdsa] Extend setup to allow for random and fixed data

FYI @bilgiday

[capture] Improve/simplify fvsr key batch capture device commands

Currently starts generating plaintexts and keys directly from PRNG seed=0 (using Mersenne twister PRNG).
Is split into two phases - execute and generate next batch of keys and plaintext in memory.

Device-side could be improved:

Provide command to set initial plaintext
Provide command to deliberately set (initial) sample_fixed value
Remove split and compute next key and plaintext directly before execution. To prevent a systematic difference b/w fixed and random key, always generate a next key using PRNG but only use if sample_fixed = 0. This is easy when using a SW AES instead of PRNG.
Transfer full ciphertext instead of only 4 bytes at the end of each batch to simplify capture side. Performance penalty is negligible.

Consider removing FTDI related code and update readme

Now that #23 is merged and programming OpenTitan using SAM3U seems to be working we should consider removing FTDI related code and update the getting started doc.

[capture] Naming convention for capture commands

Test capture script #150 uncovered a problem with naming of some of the capture commands and arguments. At the moment we don't use consistent naming strategy, which makes it complicated to automate the testing. Since the repository is likely to be extended with newer tests, we should come up with a consistent naming strategy to make maintaining easier.
Tasks:

Document the naming strategy
Modify capture.py and the test scripts.

get_fpga_buildtime

Please see #36 for context and newaetech/chipwhisperer#306 for what needs to be done.

[capture] Change PRNG for on-device test data generation (TVLA)

We currently use Python random host-side and an implementation of the same Mersenne twister on device side.

There is a danger of the Python class changing.
This is not precisely in line with the definition of TVLA.
Using something like AES to generate random data is very straightforward / comprehensible.

We could replace it by a SW implementation of AES to generate the data on-device.

Deprecated bokeh parameter used in cw/util/plot.py

In python-requirements.txt we do not specify the version of bokeh. In one of the recent version, plot_width was replaced by width. (The changelog says it was removed in the docs in version 2.4, however I only encountered the problem with release 3.0.3 not in 2.4.3)

This causes some errors in

ot-sca/cw/util/plot.py

Line 17 in 7455181

plot = figure(plot_width=800)

When changing the parameter, I encountered bokeh.core.serialization.SerializationError: can't serialize <class 'range'>
which is also discussed in bokeh/bokeh#12313 (comment)

So, it looks like as there are more things to deal with when upgrading to 3.0.3.

What's the best way to fix this issue?
Pin the bokeh version or fix it and require a bokeh update?

[ecdsa] Try increasing sampling rate with updated firmware

Currently, we're sampling with 50 MHz and the target is running at ~5 MHz for ECDSA captures to keep a 10x oversampling ratio (some of this work is not yet in the repo). Above 50 MHz it seems the synchronization between multiple runs/segments is lost. We should re-investigate this with an updated Husky firmware.

[podman] Add container to CI

As discussed with @vrozic , currently there is no CI job for testing the container. To ensure that future changes to the container infrastructure are working, adding a new CI job could be useful.

add doc for sha3 capture and tvla

@y-srini mentioned in the ot-sca meeting that some doc regarding sha3 capture and tvla would be helpful.
I'll push a PR soon, but want to address #111 first.

Dockerfile may need updating

We haven't been updating the Dockerfile for a while and it would be worth checking whether it's still working the the latest changes in ot-sca or if it needs updating.

[capture] Specify offset in clk cycles

At the moment, measurement offset in capture configuration files is specified in samples.
It probably makes more sense to use clk cycles of the target as a unit of measurement.

CW310 binaries built from OpenTitan master currenly not working

At the moment, binaries built for the CW310 board from OpenTitan master don't seem to work. I am not 100% sure what the root cause for this is. I suspect the main issue is that we changed the clock frequency for the CW310 some time ago to speed up TLT (see this PR lowRISC/opentitan#19479 or specifically commit lowRISC/opentitan@162cdab).

As a temporary workaround, we can use the ot-sca_2023-07-15_dd8b709_cw310 tag on my fork as a base (this is when I last updated CW310 bitstreams and binary): https://github.com/vogelpi/opentitan/releases/tag/ot-sca_2023-07-15_dd8b709_cw310

aes_serial does not send a response for the first check version command

Noticed this while working on #23 (should not be related). When I run simple_capture_traces.py. If I send multiple commands, it sends a response for each one except for the first one.

Restructure the repo directories

Currently, data capture, target and analysis scripts are stored in the same directory. Consider a top level view similar to the following:

tree .
.
├── analysis (models, pre/post processing libraries)
├── attacks (mountable attacks)
├── capture (data capture)
│   ├── cw
│   └── picoscope
├── obj (access to precompiled bitstreams, etc, TBD)
├── targets (target boards)
│   └── cw305
├── tests (integration tests, and end-to-end testing)
└── utils (utility libraries, e.g. cw to trs conversion scripts)

[objs] update ECC384 binaries

OT PR 19380 updated ecc384_serial.c and p384_ecdsa_sca.s to enable trace capturing with the latest setup.

Binaries should be updated.

Add wildcards to .gitattributes to include all bin files in git lfs

This is to remind me to have a look at #109 (comment)

[capture] Trace storage: Future-proof solution

We currently hold all traces in RAM after capture / during analysis and write to CW projects, i.e. numpy files. Larger trace amounts require sequential iterations through traces to reduce RAM consumption.

[capture] Implement all needed capture.py-provided commands using simple capture script

When the simple capture script is mature enough, use as template to implement remaining capture commands.

React to changes in spi and spiflash

SPIFLASH_RAW_BUFFER_SIZE (sw/device/boot_rom/spiflash_frame.h) increased from 1024 to 2048 lowRISC/opentitan#5068.

I took a quick look and unless I missed something this shouldn't break our workflow but we should increase the frame size for the sake of efficiency.

@vogelpi @moidx fyi.

[ecdsa] Don't reset target between traces

It seems the target is currently reset between capturing different traces or trace segments which leads to very low performance as the binary needs to be transferred over SPI for every segment. We should look into changing the capture setup to not require a reset of the target between segments.

Remove spiflash dependency from the ot-sca repo

Implement firmware bootstrap using the native target SPI interface, or the capture board SPI connection available in the target connector.

This removes the dependency on the SPI FTDI cable currently used to load the firmware on the device.

Debug aes-fvsr-key-batch timing issues

PR #82 added the corresponding target binaries to support the aes-fvsr-key-batch command. For most of us this seems to work just fine but I saw lots of errors like:

Fixed key: b'811e3731b0120a7842781e22b25cddf9'
Connected to ChipWhisperer (num_samples: 1200, num_samples_actual: 1200, num_segments_actual: 1)
Capturing:  16%|███▋                   | 162/1000 [00:00<00:01, 492.97 traces/s]
Traceback (most recent call last):
  File "./capture.py", line 771, in <module>
    app()
 ...
  File "./capture.py", line 502, in aes_fvsr_key_batch
    capture_aes_fvsr_key_batch(
  File "./capture.py", line 460, in capture_aes_fvsr_key_batch
    check_ciphertext(ot, expected_last_ciphertext, 4)
  File "./capture.py", line 221, in check_ciphertext
    assert actual_last_ciphertext == expected_last_ciphertext[0:ciphertext_len], (
AssertionError: Incorrect encryption result!
actual: bytearray(b'^\xdc\xe8{')
expected: [ 85 154 226   3  30  12  31 147  54 197 151  35   4 134  80  98]

In words, there are sometimes mismatches between the expected and received ciphertexts. Sometimes, the failure occurs for the first batch, sometimes for a later batch, sometimes it doesn't occur at all. The failures seem to depend on timing (adding some sleep command on the Python side seem to help), Husky firmware (the latest firmware seems to be more affected), and maybe also USB connection setup (docking station, hub, laptop directly).

We should root cause and fix the problem. Otherwise we can't reliably do long running captures. Imagine we collect 10 Mio traces and get such a failure after 2 hours - all traces will be lost.

Deprecate CW-Lite support

All of us now have access to the Husky scope and we almost exclusively use Husky instead of CW-Lite due to the better ADC resolution, higher sampling frequency and higher capture rate (larger on-chip buffer). Therefore we should consider to deprecate the CW-Lite support to clean up the repository.

	def _configure_scope(self, scope_gain, offset):
	self._scope.gain.db = scope_gain
	if offset >= 0:
	self._scope.adc.offset = offset
	else:
	self._scope.adc.offset = 0
	self._scope.adc.presamples = -offset
	self._scope.adc.basic_mode = "rising_edge"
	if self._is_husky:
	# We sample using the target clock * 2 (200 MHz).
	self._scope.clock.adc_mul = 2
	self._scope.clock.clkgen_freq = 100000000
	self._scope.clock.clkgen_src = 'extclk'

lowrisc / ot-sca Goto Github PK

ot-sca's Introduction

ot-sca's People

Contributors

Stargazers

Watchers

Forkers

ot-sca's Issues

Recommend Projects

Recommend Topics

Recommend Org