Giter Site home page Giter Site logo

nfdi4plants / arctrl Goto Github PK

View Code? Open in Web Editor NEW
11.0 5.0 8.0 6.34 MB

Library for management of Annotated Research Contexts (ARCs) using an in-memory representation and runtime-agnostic contract systems.

License: MIT License

F# 99.10% Batchfile 0.02% Shell 0.01% JavaScript 0.75% HTML 0.02% Vue 0.07% Python 0.04%
fair-data fsharp isa rdm arc fable-libraries

arctrl's People

Contributors

bfrommer avatar brilator avatar flowetzels avatar freymaurer avatar hlweil avatar kmutagene avatar olscholz avatar omaus avatar xiaoranzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

arctrl's Issues

[BUG] QSheet.Inputs returns sample if no string

Describe the bug && To Reproduce
I create a QAssay from byte [] stream and covert to QSheet list as such:

let ms = new System.IO.MemoryStream(byteArray)
    let _,assay = ISADotNet.XLSX.AssayFile.Assay.fromStream ms
    let tables = QueryModel.QAssay.fromAssay assay
    tables.Sheets 
    |> List.map (fun (s: QSheet) -> s.Inputs) // map over QSheets
    |> printfn "[INPUTS]: %A"

This is the relevant file:
image

And this the console output:
image

As you can see the fields with value are correctly written as Some Source, whereas the other are written as Some Sample.

Expected behavior
All fields are Some Source.

[BUG] ISADotNet.QueryModel: QProcessSequence.ValuesOf(node,ProtocolName) does not return the correct values as stored in isa.assay.xlsx

Describe the bug
QProcessSequence.ValuesOf(node,ProtocolName) does not return the correct values as stored in isa.assay.xlsx

To Reproduce
I created a test.fsx in an sample arc which reproduces the error, I invited HLWeil

Expected behavior
calling:

#r "nuget: arcIO.NET, 0.0.6"
#r "nuget: ISADotNet.QueryModel, 0.7.0-preview.5"

open ISADotNet
open ISADotNet.QueryModel

let arcPath = __SOURCE_DIRECTORY__ + @"\..\"

let p,a = arcIO.NET.Assay.readByName arcPath "testassay"
let qa = QueryModel.QStudy.fromAssay a

qa.ProtocolNames

let allSamples = 
    qa.LastNodes()
    |> Set.ofSeq


let getBioRep (fN:QNode) = 
    match qa.ValuesOf(fN,ProtocolName = "Growth").WithName("biological replicate").Values.Head with
    | QueryModel.ISAValue.Characteristic x -> x.Value.Value.AsString 
    | _ -> failwith "no biorep please add"
    
let t1 = 
    allSamples 
    |> Array.ofSeq 
    |> Array.map getBioRep 

should return the values stored in the isa.assay file:

image

but returns:
image

Screenshots

Additional context
calling

let getGenotype (fN:QNode) = 
    match qa.ValuesOf(fN,ProtocolName = "GenotypeLib").WithName("Genotype").Values.Head with
    | QueryModel.ISAValue.Characteristic x -> x.Value.Value.AsString 
    | _ -> failwith "no biorep please add"

/// Seems to return correct values
let t2 = 
    allSamples 
    |> Array.ofSeq 
    |> Array.map getGenotype     

returns correct results

[Feature Request] Assay file reader

Assay files are modified using the swate tool. To access this information for other tasks, a reader should be added.
As the swate tool adds tables to assay xlsx files, a prerequisite for this reader is a table reader in FSharpSpreadsheetML. The information could be stored as grouped processes.

[Fable] Fable reflection and IEnumerable

The Problem

At the moment ISADotNet uses a function, heavily abusing reflection to append lists/seqs and arrays as obj.

let inline appendGenericListsByType l1 l2 (t:Type) =
    System.Reflection.Assembly
        .GetAssembly(typeof<_ list>)
        .GetType(if isArray then "Microsoft.FSharp.Collections.ArrayModule" else "Microsoft.FSharp.Collections.ListModule")
        .GetMethod("Append")
        .MakeGenericMethod(t)
        .Invoke(null, [|l1;l2|])

All of the functions used here are not fable compatible, so how can we translate it?

Tried Solutions

I tried using #if FABLE_COMPILER to give an alternative solution in which we just assume the type to be a list.

Force List.append

#if FABLE_COMPILER
!!List.append l1 l2 // `!!` means the compiler should ignore any typechecks here
#else
...

This solution works for lists but not for arrays or seq. In the case of arrays it will not append correctly, seqs are appended correctly but do not match the seq type anymore.

So i tried another solution, in which i wanted to match the list to any Array type. This works in dotnet in a .fsx but this does not even compile, as fable cannot do such checks on runtime, as js does not support it.

try match input to array

match l1 with
| :? System.Array as arr ->
    !!Array.append l1 l2
| _ -> 
    !!List.append l1 l2

l1.GetType().IsArray

The same thing goes for an if...else with l1.GetType().IsArray.

warning FABLE: Types can only be resolved at compile time. At runtime this will be same as `typeof

-> Therefore in fable l1.GetType().IsArray on a inline functions with obj with always resolve as obj and will never return true.

Using Emit

Emit can tell fable to change the output of a function fully to any js code written inside the Emit attribute.

Fable uses special classes to represent fsharp IEnum types, with different append methods, so there is no one function to rule them all and would require typechecking at runtime again (which does not work). Unless there is a, to me unknown way of converting all of these fable classes to a generic js array, this will also not work.

How do f# IEnum types look in js? repl

let l = [1 .. 20]
let a = [|1 .. 20|]
let s = seq [1 .. 20]
import { toArray, toList } from "fable-library/Seq.js";
import { rangeDouble } from "fable-library/Range.js";

export const l = toList(rangeDouble(1, 1, 20));

export const a = toArray(rangeDouble(1, 1, 20));

export const s = toList(rangeDouble(1, 1, 20));

Result

It might be necessary to remove the reflection in this file and instead use type save functions based on generics instead. This would result in a rather large redesign.

I am open for suggestions on how to solve this issue @HLWeil @muehlhaus @kMutagene

[BUG] Unexpected ISADotNet.Viz Assay representation

Describe the bug
This is a follow up issue to #51. I added an input column Source Name and put the previous information from Data File Name in there and added artificial names to the now empty output column. These columns now look like this:

Data File Name Source Name
result1 DB_097_CAMMD_CAGATC_L001_R1_001.fastq.gz
result2 DB_099_CAMMD_CTTGTA_L001_R1_001.fastq.gz
result3 DB_103_CAMMD_AGTCAA_L001_R1_001.fastq.gz
result4 DB_161_reC3MD_GTCCGC_L001_R1_001.fastq.gz
result5 DB_163_reC3MD_GTGAAA_L001_R1_001.fastq.gz
result6 DB_165_re-C3MD_GTGAAA_L002_R1_001.fastq.gz

The DAG will now be displayed, but i found two odd occurences:

image

As you can see the DAG shows that sheet 3 and 4 are applied twice. I cannot find a reason why this should be inteded, maybe you can help me out here.

[BUG] Non descriptive Error with missing input/output in ISADotNet.Viz

Describe the bug
I use the code block below in Swate to display Swate tables as Viz in embedded Html. For the attached assay.xlsx file i get a non descriptive error "Object reference not set to an instance of an object.". I found that the last slide 4COM01_RNASeq is missing an Input Column and when i add it the error is gone.

let factors, protocol, assay =  JsonExport.parseBuildingBlockSeqsToAssay worksheetBuildingBlocks
let processSequence = Option.defaultValue [] assay.ProcessSequence
/// This function throws the error, all above works
let dag = Viz.DAG.fromProcessSequence (processSequence,Viz.Schema.NFDIBlue)
let dagHtml = dag |> CyjsAdaption.MyHTML.toEmbeddedHTML

metabolomics_example.xlsx

To Reproduce
See Bug description

Expected behavior
Make the error message more descriptive for the user

[BUG] ISADotNet reference

When referencing ISADotNet like this #r "nuget: ISADotNet" in a .fsx i get the following error:

image

This might be due to dependencies without fixed minimum version.

[BUG] build -t watchdocs unable to process Conversions.fs

While testing the new library strcuture with the fslab-docs template it returned an error.

API docs:
  generating model for 2 assemblies in API docs...
  loading 2 assemblies...
  registering entities for assembly ISADotNet...
  registering entities for assembly ISADotNet.XLSX...
Error :
FSharp.Compiler.ErrorLogger+UnresolvedPathReferenceNoRange: Assembly: DocumentFormat.OpenXml, full path: DocumentFormat.OpenXml.Spreadsheet.Row
   at FSharp.Compiler.TypedTree.CcuThunk.EnsureDerefable(String[] requiringPath) in F:\workspace\_work\1\s\src\fsharp\TypedTree.fs:line 5103
   at FSharp.Compiler.TypedTree.NonLocalEntityRef.TryDeref(Boolean canError) in F:\workspace\_work\1\s\src\fsharp\TypedTree.fs:line 3157
   at FSharp.Compiler.TypedTree.EntityRef.get_Deref() in F:\workspace\_work\1\s\src\fsharp\TypedTree.fs:line 3254
   at FSharp.Compiler.TypedTreeOps.stripTyEqnsA(TcGlobals g, Boolean canShortcut, TType ty) in F:\workspace\_work\1\s\src\fsharp\TypedTreeOps.fs:line 739
   at FSharp.Compiler.TypedTreeOps.tyargsEnc(TcGlobals g, FSharpList`1 gtpsType, FSharpList`1 gtpsMethod, FSharpList`1 args) in F:\workspace\_work\1\s\src\fsharp\TypedTreeOps.fs:line 8035
   at FSharp.Compiler.TypedTreeOps.typeEnc(TcGlobals g, FSharpList`1 gtpsType, FSharpList`1 gtpsMethod, TType ty) in F:\workspace\_work\1\s\src\fsharp\TypedTreeOps.fs:line 8009
   at Microsoft.FSharp.Primitives.Basics.List.map[T,TResult](FSharpFunc`2 mapping, FSharpList`1 x) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\local.fs:line 247
   at FSharp.Compiler.TypedTreeOps.XmlDocArgsEnc(TcGlobals g, FSharpList`1 gtpsType, FSharpList`1 gtpsMethod, FSharpList`1 argTys) in F:\workspace\_work\1\s\src\fsharp\TypedTreeOps.fs:line 8040
   at FSharp.Compiler.TypedTreeOps.XmlDocSigOfVal(TcGlobals g, Boolean full, String path, Val v) in F:\workspace\_work\1\s\src\fsharp\TypedTreeOps.fs:line 8090
   at FSharp.Compiler.SourceCodeServices.SymbolHelpers.GetXmlDocSigOfScopedValRef(TcGlobals g, EntityRef tcref, ValRef vref) in F:\workspace\_work\1\s\src\fsharp\symbols\SymbolHelpers.fs:line 541
   at FSharp.Compiler.SourceCodeServices.FSharpMemberOrFunctionOrValue.get_XmlDocSig() in F:\workspace\_work\1\s\src\fsharp\symbols\Symbols.fs:line 1845
   at FSharp.Formatting.ApiDocs.CrossReferences.getXmlDocSigForMember(FSharpMemberOrFunctionOrValue memb) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\GenerateModel.fs:line 575
   at FSharp.Formatting.ApiDocs.CrossReferenceResolver.registerMember(FSharpMemberOrFunctionOrValue memb) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\GenerateModel.fs:line 647
   at FSharp.Formatting.ApiDocs.CrossReferenceResolver.registerEntity(FSharpEntity entity) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\GenerateModel.fs:line 669
   at <StartupCode$FSharp-Formatting-ApiDocs>[email protected](FSharpEntity arg00) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\GenerateModel.fs:line 2149
   at Microsoft.FSharp.Collections.SeqModule.Iterate[T](FSharpFunc`2 action, IEnumerable`1 source) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\seq.fs:line 497
   at FSharp.Formatting.ApiDocs.ApiDocModel.Generate(FSharpList`1 projects, String collectionName, FSharpOption`1 libDirs, FSharpOption`1 otherFlags, Boolean qualify, FSharpOption`1 urlRangeHighlight, String root, FSharpList`1 substitutions, Boolean strict) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\GenerateModel.fs:line 2149
   at FSharp.Formatting.ApiDocs.ApiDocs.GenerateHtmlPhased[a](FSharpList`1 inputs, String output, String collectionName, FSharpList`1 substitutions, FSharpOption`1 template, FSharpOption`1 root, FSharpOption`1 qualify, FSharpOption`1 libDirs, FSharpOption`1 otherFlags, FSharpOption`1 urlRangeHighlight, FSharpOption`1 strict) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.ApiDocs\ApiDocs.fs:line 54
   at <StartupCode$fsdocs>[email protected](Unit unitVar0) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.CommandTool\BuildCommand.fs:line 601
   at <StartupCode$fsdocs>.$BuildCommand.protect@338(CoreBuildOptions this, FSharpFunc`2 f) in C:\Users\Kevin\source\repos\fsprojects\FSharp.Formatting\src\FSharp.Formatting.CommandTool\BuildCommand.fs:line 340

Maybe @kMutagene can help with this.

[BUG] `Worksheet.setSheetData` corrupts Xlsx files saved in MS Excel before

Describe the bug
If an Xlsx file that has been opened and saved in MS Excel before is edited via Worksheet.setSheetData and saved, the file gets corrupted and the SheetData of this file's worksheet cannot be obtained anymore.

To Reproduce

  1. Create a new Xlsx file via FSharpSpreadsheetML and save it:
#r "nuget: FSharpSpreadsheetML"

open FSharpSpreadsheetML

let path =
    System.Environment.GetFolderPath(System.Environment.SpecialFolder.UserProfile)
    |> fun fp -> System.IO.Path.Combine(fp, "mySpreadsheet.xlsx")

let doc = Spreadsheet.init "mySheet" path

let sd = Spreadsheet.tryGetSheetBySheetName "mySheet" doc |> Option.get

SheetData.appendValueToRowAt None 1u "Hello, World!" sd

Spreadsheet.close doc
  1. Open the file in MS Excel and save it.
  2. Replace the sheet with the same sheet:
let doc = Spreadsheet.fromFile path true

let sd = Spreadsheet.tryGetSheetBySheetName "mySheet" doc |> Option.get
let wsp = Spreadsheet.tryGetWorksheetPartBySheetName "mySheet" doc |> Option.get
let ws = Worksheet.get wsp
setSheetData sd ws

Spreadsheet.close doc
  1. Open file again MS Excel
  2. See error

Expected behavior
Uncorrupted file.

OS and framework information (please complete the following information):

  • OS: Win10 Pro
  • OS Version: 10.0, Build 19042
  • .Net core SDK version: 5.0.403

[BUG] Person.removeFullName returns empty Person lists

Describe the bug
Person.removeFullName returns empty Person lists under most conditions (detailed explanation under Possible solution(s)).

To Reproduce
Steps to reproduce the behavior:

  1. Use the ArcCommander (easiest approach)
  2. Initiate ARC: arc init
  3. Add investigation: arc i create
  4. Add person: arc i person register -> Fill LastName: Doe and FirstName: John exemplarily
  5. Repeat 4. with another person (e.g. LastName: Patternman and FirstName: Max)
  6. To be sure that everything worked, check with arc i person list
  7. Unregister one of the persons: arc i person unregister -> Fill LastName and FirstName with one of the persons from before
  8. Check again with arc i person list

Expected behavior
Only one of the persons is gone.

Screenshots
image

Possible solution(s)
In \API\person.fs, the removeFullName function is defined as follows:

let removeByFullName (firstName : string) (midInitials : string) (lastName : string) (persons : Person list) =
    List.filter (fun p -> 
        if midInitials = "" then 
            p.FirstName = Some firstName && p.LastName = Some lastName
            |> not
        else 

            p.FirstName = Some firstName && p.MidInitials = Some midInitials && p.LastName = Some lastName
            |> not
    ) persons

The compiler interprets the part

p.FirstName = Some firstName && p.LastName = Some lastName
|> not

as

(p.FirstName = Some firstName) && (p.LastName = Some lastName
|> not)

Either change to

(p.FirstName = Some firstName && p.LastName = Some lastName)
|> not

or, more elegantly, to

p.FirstName <> Some firstName && p.LastName <> Some lastName

[BUG] Assay: Headers with strange characters are not parsable

Describe the bug
If an Assay consists of headers (regardless whether Parameter, Factor, or Characteristic) which contains unusual characters (namely ( or [) the column cannot be parsed.

To Reproduce
Steps to reproduce the behavior:

  1. Create Assay with column headers hello (world) and hello [world]
  2. Try parse with common ISA.NET functions
  3. See error

Expected behavior
Either

  • parse correctly, or
  • throw error or warning that the respective column cannot be parsed

Add ISATab reading and writing capability

Currently, ISATab style files can only be read and written as binary XLSX files. This should be extended to also capture plain text Tab files, as originally intented by the format specification.
This should be pretty straight forward for investigation files. For assay and study files though, some planning is necessary, as plain text files don't support multiple sheets like XLSX files.

[Feature Request] Quality of life

  • Change Create static methods to optional parameters

  • Adjust naming: GetNameAsString should be changed to GetName, as name is already a string

  • For consistency, change methods to static methods

[BUG] ProcessSequence.getOutputsWithCharacteristicBy does not return correct Characteristics

Describe the bug
The function ISADotNet.API.ProcessSequence.getOutputsWithCharacteristicBy does not return the Characteristics and Outputs of the given protocol. Instead, the Characteristics of the Protocol where the outputs serve as Input are returned.

To Reproduce
Steps to reproduce the behavior:

  • Take an isa assay file with at least two protocols and one characteristic each
  • Try to retrieve the characteristic of protocol one with the function ISADotNet.API.ProcessSequence.getOutputsWithCharacteristicBy

Expected behavior
The function should return the chraracteristics of the protocol that match predicate.

[BUG] JsonReader fails for arrays in AnyOf cases

The following code

let s = """
    {
      "characteristics": [
        {},
        {}
      ]
    }
"""

JsonSerializer.Deserialize<ProcessOutput>(s,JsonExtensions.options)

fails with System.Collections.Generic.KeyNotFoundException: An index satisfying the predicate was not found in the collection.

Directly trying to deserialize a sample works fine:
JsonSerializer.Deserialize<Sample>(s,JsonExtensions.options)

ProcessOutput is an AnyOf union type where sample is case. It is therefore save to assume that the problem is specific to the AnyOf deserializer.

Removing one of the two items inside the characteristics property also alleviates the problem. So it seems this problem is also specific to item lists inside an AnyOf item.

[Suggestion] Unify Assay and Assays namespaces

It took me very long to find that you cannot write a single Assay via something like Assay.write. Instead, even when only writing a single Assay file, this seems to be necessary:

open ISADotNet.XLSX

[Assay.empty]
|> Assays.write ...

Also this seems not be be the function that just writes a assay.xlsx file is it? does something like that exist?

So what i am looking for is something like Investigation.toFile for a single Assay. Can this be done currently?

I would also suggest to drop the s from Assays to provide a more unified API accross IsaDotNet and IsaDotNet.XLSX

[Feature Request] Read assay and study files as rows

A ISAXLSX assay file reader was implemented in the latest release. This reader creates an object of Type Assay. This is useful for interop with ISAJson or the investigation file, but can be cumbersome to scan through computationally in other cases. The problem here is, that the information which is depicted on a rowwise basis in the assay file gets dispersed into different places of the datamodel.
Instead, an additional obvious approach would be to read the assay file table as rows.

[BUG] Cannot write study containing samples with different amount of characteristics as ISA-Tab.

Describe the bug
A Study file that looks like this:

Source Name Characteristics[c1] Characteristics[c2] Sample Name
src_1 yes smpl_1
src_2 yes yes smpl_2

cannot be written via the current API.

To Reproduce

#r "nuget: ISADotNet.XLSX"

open ISADotNet
open ISADotNet.XLSX

let createStudyProcess sourceName sampleName procName protName (characteristics: (string*string) list) =
    let c =
        characteristics
        |> List.mapi (fun i (k,v) ->
            MaterialAttributeValue.create(
                Category=MaterialAttribute.fromStringWithValueIndex k "" "" i,
                Value= Value.Name v
            )
        )
    let src =
        Source.create(Name = sourceName, Characteristics = c) |> ProcessInput.Source
    let smpl =
        Sample.create(Name = sampleName) |> ProcessOutput.Sample
    let proc =
        Process.create(Name=procName, ExecutesProtocol=Protocol.create(Name = protName), Inputs=[src], Outputs = [smpl])
    proc

let s =
    Study.create(
        ProcessSequence = [
            createStudyProcess "src_1" "smpl_1" "p1" "1" ["c1","yes"]
            createStudyProcess "src_2" "smpl_2" "p2" "1" ["c1","yes"; "c2","yes"]
        ])

s
|> StudyFile.Study.toFile "test.xlsx"

Expected behavior
Writes study to file

Actual behavior

System.Exception: Could not write Study to Xlsx file in path "test.xlsx": 
        The lists had different lengths.
list.[0] is 3 elements shorter than list.[2] (Parameter 'list.[0]')
   at Microsoft.FSharp.Core.PrintfModule.PrintFormatToStringThenFail@1439.Invoke(String message)
   at ISADotNet.XLSX.StudyFile.Study.toFile(String p, Study study)
   at <StartupCode$FSI_0013>.$FSI_0013.main@() in C:\Users\schne\Desktop\test\Untitled-1:line 155
Stopped due to error

OS and framework information (please complete the following information):

  • Windows 11
  • OS Version 10.0.22000 Build 22000
  • .Net 6.0.302

[Question] Can a QSheet have more than one protocol?

I use the following code to ensure i have exactly one QSheet. Now my question is can in this case Protocols have any other number of items except 1? And what real life cases would be the reason for this?

let assay = Assay.fromString jsonString
let qAssay = QueryModel.QAssay.fromAssay assay
if qAssay.Sheets.Length <> 1 then
    failwith "Swate was unable to identify the information from the requested template (<Found more than one process in template>). Please open an issue for the developers."
let template = qAssay.Sheets.Head
template //QAssay
template.Protocols // Protocol list

[BUG] Characteristics order matter when creating study files

Describe the bug
Characteristics seem to just be written in the order they are passed to the process, ignoring their actual names

To Reproduce

#r "nuget: ISADotNet.XLSX"

open ISADotNet
open ISADotNet.XLSX

let createStudyProcess sourceName sampleName procName protName (characteristics: (string*string) list) =
    let c =
        characteristics
        |> List.mapi (fun i (k,v) ->
            MaterialAttributeValue.create(
                Category=MaterialAttribute.fromStringWithValueIndex k "" "" i,
                Value= Value.Name v
            )
        )
    let src =
        Source.create(Name = sourceName, Characteristics = c) |> ProcessInput.Source
    let smpl =
        Sample.create(Name = sampleName) |> ProcessOutput.Sample
    let proc =
        Process.create(Name=procName, ExecutesProtocol=Protocol.create(Name = protName), Inputs=[src], Outputs = [smpl])
    proc

let s =
    Study.create(
        ProcessSequence = [
            createStudyProcess "src_1" "smpl_1" "p1" "1" ["c1","yes1"; "c2","yes2"]
            createStudyProcess "src_2" "smpl_2" "p2" "1" ["c2","yes2"; "c1","yes1"]
        ])

s
|> StudyFile.Study.toFile "test.xlsx"

Expected behavior
Writes correct study file:

Source Name Characteristics[c1] Characteristics[c2] Sample Name
src_1 yes1 yes2 smpl_1
src_2 yes1 yes2 smpl_2

Actual behavior

the order of the second sample is wrong:

image

OS and framework information (please complete the following information):

  • Windows 11
  • OS Version 10.0.22000 Build 22000
  • .Net 6.0.302

Update Api

To update the current API state we need to bring some of the type restrictions up to date with the DataModel changes done in previous commits. In addition we should redistribute the API modules over the .fs files in the API folder.

  • Update type restrictions.
  • Redistribute API modules over .fs files (leave API.fs for now as deprecated)
  • As discussed i added a general Record Type update type (See Update.fs) This allows a user to define how a record type should be updated.
  • Add unit tests for Update.fs.

[Feature Request] Add path equality functions

Is your feature request related to a problem? Please describe.
ISADotNet.API.Assay.existsByFileName will return false if the given value is "assayFileName\isa.assay.xlsx" but the assay object contains the value "assayFileName/isa.assay.xlsx".

Describe the solution you'd like
The function should ne able to see the equality of the two paths, even though the two strings are different.

[Bug] Assay File Reader fails when no custom xml is present

Assay File Reader fails when no CustomXml was added with the Swate tool

System.ArgumentException: The input sequence was empty.
Parameter name: source
   at Microsoft.FSharp.Collections.SeqModule.Head[T](IEnumerable`1 source) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\seq.fs:line 1364
   at ISADotNet.XLSX.AssayFile.SwateTable.readSwateTables(WorkbookPart wbp)
   at ISADotNet.XLSX.AssayFile.AssayFile.fromFile(String path)
   at <StartupCode$FSI_0005>.$FSI_0005.main@()
Stopped due to error

[BUG] Writing xlsx files looses numberFormat info

Describe the bug
I am working on a function to write swate tables from a extern data type. The function looks like this:

/// tables is the external datatype parsed to `Assay`
let assay =  Export.parseBuildingBlockSeqsToAssay tables
let a = QueryModel.QAssay.fromAssay assay
let wb = 
    workbook {
        for (i,s) in List.indexed a.Sheets do QSheet.toSheet i s
        sheet "Assay" {
            for r in MetaData.toDSLSheet assay [] do r
        }
    }
/// Parsing unit is not done correctly.
let fsSpreadsheet = wb.Value.Parse().ToBytes()

Most code is taken from here https://github.com/nfdi4plants/ISADotNet/blob/a06af930e4d3f9d3c49a7b07bb0496f927c4e6cc/src/ISADotNet.XLSX/AssayFile/Assay.fs#L188

If i try to write Swate unit columns, the final .xlsx file does not contain any numberFormat information. All cells have DataType string, even though it should be something like this "0.00 \"unit\"".

Image shows Cell.DataType and Cell.Value
image

If i convert my assay to json with the following code, all unit information is still there, so i assume the information is lost somwhere in ISADotNet to SpreadsheetFs

/// tables is the external datatype parsed to `Assay`
let assay =  Export.parseBuildingBlockSeqsToAssay tables
let parsedJsonStr = ISADotNet.Json.Assay.toString assay
// no unit information lost.

[BUG] ISA-Tab Study file omits different characteristics when samples have the same amount of characteristics, but different annotations

Describe the bug
A Study file that looks like this:

Source Name Characteristics[c1] Characteristics[c2] Sample Name
src_1 yes smpl_1
src_2 yes smpl_2

cannot be written correctly via the current API.

To Reproduce

#r "nuget: ISADotNet.XLSX"

open ISADotNet
open ISADotNet.XLSX

let createStudyProcess sourceName sampleName procName protName (characteristics: (string*string) list) =
    let c =
        characteristics
        |> List.mapi (fun i (k,v) ->
            MaterialAttributeValue.create(
                Category=MaterialAttribute.fromStringWithValueIndex k "" "" i,
                Value= Value.Name v
            )
        )
    let src =
        Source.create(Name = sourceName, Characteristics = c) |> ProcessInput.Source
    let smpl =
        Sample.create(Name = sampleName) |> ProcessOutput.Sample
    let proc =
        Process.create(Name=procName, ExecutesProtocol=Protocol.create(Name = protName), Inputs=[src], Outputs = [smpl])
    proc

let s =
    Study.create(
        ProcessSequence = [
            createStudyProcess "src_1" "smpl_1" "p1" "1" ["c1","yes"]
            createStudyProcess "src_2" "smpl_2" "p2" "1" ["c2","yes"]
        ])

s
|> StudyFile.Study.toFile "test.xlsx"

Expected behavior
Writes correct study file

Actual behavior

Both samples get annotated via Characteristics [c1], Characteristics [c2] is omitted:

image

OS and framework information (please complete the following information):

  • Windows 11
  • OS Version 10.0.22000 Build 22000
  • .Net 6.0.302

[BUG] `StudyFile.Study.fromFile` returns some empty lists in the resulting study

Describe the bug
StudyFile.Study.fromFile returns some empty lists in the resulting study in record fields

  • .Protocols
  • .ProcessSequence
  • .Factors
  • .CharacteristicCategories

To Reproduce
Steps to reproduce the behavior:

  1. Create a study with the ArcCommander
  2. Load it via StudyFile.Study.fromFile
  3. See empty lists

Expected behavior
Nones instead of Some []s.

Screenshots
The difference between the same study, once from the Study file itself (top) and once from the Investigation file (bottom):
image

[BUG] Assay from Investigation File and Assay from AssayFile are not the same even if they are the same

Describe the bug
When using AssayFile.Assay.fromFile and (Investigation.fromFile <path>).Studies.Value.Assays.Value.[i].Value, the resulting assays are not identical even if they were initialized together in one ARC (e.g. via ArcCommander).

To Reproduce
Steps to reproduce the behavior:

  1. Initiate ARC
  2. Create Investigation
  3. Add Assay
  4. Use arc a edit -a <assayID>

Expected behavior
Assay objects are identical.

Screenshots
image

Add characteristics only to input when Reading Assay.xlsx files

Currently when an assay.xlsx file is read, the MaterialAttributeValues (or Characteristics) which are located as columns between inputs and output are assigned to both when reading the file. To reduce ambiguity when accessing the given characteristic or when again writing to an assay.xlsx file. The characteristic could also only be appended to the input.

[BUG] Assay: Different values for a Characteristic for the same Source Name should not be possible

Describe the bug
When creating an Assay and set several rows with the same Source Name, it should not be possible to set different Values for a Characteristic, but it is atm.

To Reproduce
Steps to reproduce the behavior:

  1. Create an Assay with a Swate table
  2. Add Characteristic MyCharacteristic 1
  3. Write MySource in the first value row for column Source Name
  4. Write the same in the second row
  5. Write value 1 in the first value row for column My Characteristic 1
  6. Write value 2 in the second value row for column My Characteristic 1 (see screenshot below for steps 2 โ€“ 6)
  7. Parse with common ISA.NET functions
  8. Notice that there are several inputs with different Characteristics yet with the same name

Expected behavior
Forbid this (via, e.g., throwing an error or sth. alike).

Screenshots
image

[Feature Request] ReadMe With Quickstart Guide

A README with additional information is needed. These could include

  • The aim of this repository
  • Build guide
  • A link to the nuget package
  • Overview over the project structure (ISA Datamodel, JsonIO and XLSX IO)
  • Quick ISA explanation with links
  • Some code examples (Which could be moved to a gh-pages documentation if growing)

[Feature Request] Protocol Columns

Is your feature request related to a problem? Please describe.
Assay and Study xlsx files will soon be containing more columns describing the underlying protocol. These will be:

  • Protocol Type (+ Term Source REF and Term Accession Number)
  • Protocol REF
  • Protocol Description
  • Protocol URI
  • Protocol Version

Describe the solution you'd like
Assay (and Study) xlsx file writers should be able to handle these new columns.

References
nfdi4plants/Swate#207
nfdi4plants/nfdi4plants_ontology#32

[BUG] Json AnyOf reader fails for escaped string

Describe the bug
The given string "string \"inside"\ string"

  • Outer quotation marks are to depict it being a json string
  • Inner escaped quotation marks are part of the string value itself

Can be correctly handled by json deserializer if it is deserialized as a string, but fails if it is deserialized as part of an AnyOf object.

To Reproduce

works:

"\"string \\\"inside\\\" string\""
|> ISADotNet.JsonExtensions.fromString<string>

fails:

"\"string \\\"inside\\\" string\""
|> ISADotNet.Json.Value.fromString

Expected behavior
Should result in

val it : ISADotNet.Value = Name "string "inside" string"

Additional context
Error message

System.Collections.Generic.KeyNotFoundException: An index satisfying the predicate was not found in the collection.
   at Microsoft.FSharp.Collections.ArrayModule.loop@448-37[T,TResult](FSharpFunc`2 chooser, T[] array, Int32 i) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\array.fs:line 450
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadCore[TValue](Utf8JsonReader& reader, Type returnType, JsonSerializerOptions options)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, Type returnType, JsonSerializerOptions options)
   at <StartupCode$FSI_0073>.$FSI_0073.main@()
Stopped due to error

Worksheet name parsing from assay to CommonAPI rowMajor mistake

I translated one of the existing Swate templates to the new preview format and used the new common api row major format with it.
In Assay the worksheet name was correct, but after parsing to RowWiseSheet it was wrong as 1SPL01_plants was changed to 1SPL01.

The problem lies in https://github.com/nfdi4plants/ISADotNet/blob/AssayFileIO/src/ISADotnet/JsonIO/AssayCommonAPI.fs#L98 .

I am currently working on two different versions of a fix.

Term Accession of most parsed column headers is done incorrectly.

Currently when parsing s Swate AnnotationTable with ISADotNet to any ISA-JSON format. Ontology terms will be parsed as the following. Term Accession should be in uri format and can be created by combining the existing TSR and TAN with a static obo purl url:

"category": {
      "characteristicType": {
          "annotationValue": "Sample type",
          "termSource": "NFDI4PSO",
          "termAccession": "0000064"
      }
  }

Purl URL:

"http://purl.obolibrary.org/obo/"

Currently working on a PR to add this.

Enable consecutive setting of ISA object fields by the API

Hi there, i am currently testing this library in a project. There are a few things i have noticed (more issues incoming). The first one is very important imho.

How is this library intended to be used in a script? My naive approach without docs would be trying to build the object hierarchy from the ground up and then writing the whole thing to disk.

So like:
build my assay(s) -> add them to a study -> add that to an investigation -> save the whole thing to disk.

The very first step is very tedious, partly because of the forced usage of option (but this is already addressed in #24), partly because the parameter order is not pipeline friendly.

So for example, if i want to set the samples of an empty assay, this has to be done:

let assay = Assay.setMaterials Assay.empty (AssayMaterials.create (Some Samples) None)

Note that the assay is the first parameter of Assay.setMaterials. I would suggest switching the parameters everywhere, so that the object that gets changed is the last one.
I would like to be able to do something like this:

let assay =
    Assay.empty
    |> Assay.setMaterials(
		Materials.create(
			?OptField1 = ... //ignore second field because i dont know its value by using optional parameters here
		)
	)
    |> Assay.setData(
		Data.create (...)
	)
	...

Or am i just not using this as intended? If so, how would it be intended?

Update function not working for record in record

Problem

When updating one record type with another using the methods in API.Update, even when using the UpdateByExisting option, a field with a filled record is replaced by a record where no value is set.

An example for this is the Measurementtype field of the Assay type

Solution

There should be a way to detect empty records and ignore them. Possibly using an option for records in records

[BUG] FromString functions in ISADotNet.XLSX erroneous for lists of records

When trying to parse aggregated strings as used in ISATab/ISAXLSX format, two problems occur:

  1. Even when given empty strings, one empty element is returned. But the result should actually be an empty list:
    OntologyAnnotation.fromAggregatedStrings ';' "" "" ""
    will result in
    [{ ID = null ;Name = Text "" ;TermSourceREF = "" ;TermAccessionNumber = "" ;Comments = [] }]

  2. When given multiple names but no other values, records with only names should be created. Instead it crashed because of the different input lengths
    OntologyAnnotation.fromAggregatedStrings ';' "OA1;OA2" "" ""
    results in
    System.ArgumentException: The arrays have different lengths. array1.Length = 2, array2.Length = 1, array3.Length = 1 Parameter name: array1, array2, array3 at Microsoft.FSharp.Core.DetailedExceptions.invalidArg3ArraysDifferent[?](String arg1, String arg2, String arg3, Int32 len1, Int32 len2, Int32 len3) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\local.fs:line 69 at Microsoft.FSharp.Collections.ArrayModule.Map3[T1,T2,T3,TResult](FSharpFunc`2 mapping, T1[] array1, T2[] array2, T3[] array3) in F:\workspace\_work\1\s\src\fsharp\FSharp.Core\array.fs:line 309 at ISADotNet.XLSX.OntologyAnnotation.fromAggregatedStrings(Char separator, String terms, String accessions, String source) at <StartupCode$FSI_0041>.$FSI_0041.main@() Stopped due to error

Trim() alle values

It is very easy to add accidental whitespaces to any cells in excel.

These are then parsed as existing values. Add .Trim() to all value access functions to remove these issues.

Reproduce

  1. Open any tableBody cell from a swate table and add (whitespace) to it.
  2. Parse to Assay or CommonAPI rowMajor format
  3. In JSON Value can be found as " "

Expected behaviour

  1. If only whitespace exists as cell value don't count it as value
  2. .Trim() added to all value parsers (header, unit, cell value) will take care of this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.