seqan / seqan3 Goto Github PK

View Code? Open in Web Editor NEW

389.0 24.0 80.0 21.12 MB

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

Home Page: https://www.seqan.de

License: Other

C++ 97.76% CMake 2.07% Perl 0.03% Shell 0.13%

sequence-analysis seqan cpp17 cpp20 cpp-concepts bioinformatics blast sequence-alignment fasta fastq

seqan3's Introduction

SeqAn3 -- the modern C++ library for sequence analysis

SeqAn3 is the new version of the popular SeqAn template library for the analysis of biological sequences. It enables the rapid development of high-performance solutions by providing generic algorithms and data structures for:

sequence representation and transformation
full-text indexing and efficient search
sequence alignment
input/output of common file formats

By leveraging Modern C++ it provides unprecedented ease-of-use without sacrificing performance.

Please see the online documentation for more details.

Quick facts

C++ header-only library: easy to integrate with your app & easy to distribute
liberal open source license: allows integration with any app or library, requires only attribution
very high code quality standards: >97% unit test coverage, performance regression tests, ...
extensive API documentation & tutorials: more lines of documentation than lines of code
aims to support any 64-bit architecture running Linux/POSIX; currently big-endian CPU architectures like s390x are less supported

Dependencies

	requirement	version	comment
compiler	GCC	≥ 11	no other compiler is currently supported!
build system	CMake	≥ 3.5	optional, but recommended
required libs	SDSL	≥ 3.0.3
optional libs	cereal	≥ 1.3.1	required for serialisation and CTD support
	zlib	≥ 1.2	required for `*.gz` and `.bam` file support
	bzip2	≥ 1.0	required for `*.bz2` file support

Usage

We recommend that you use CMake to build your project:

Setup-Tutorial
Using CMake guarantees that all optional dependencies are automatically detected and activated.

Quick-Setup without CMake:

Clone the repository with submodules: git clone --recurse-submodules https://github.com/seqan/seqan3.git
Add the following to your compiler invocation:
- the include directories of SeqAn and its dependencies
- C++20 mode
- Macros indicating the presence of zlib and bzip2 (set only if actually available in your paths!)
The command could look like this:

g++-11 -O3 -DNDEBUG -Wall -Wextra                               \
    -std=c++20                                                  \
    -I       /path/to/seqan3/include                            \
    -isystem /path/to/seqan3/submodules/sdsl-lite/include       \
    -isystem /path/to/seqan3/submodules/cereal/include          \
    -DSEQAN3_HAS_ZLIB=1 -DSEQAN3_HAS_BZIP2=1                    \
    -lz -lbz2 -pthread                                          \
  your_file.cpp

Sponsorships

Vercel is kind enough to sponsor our documentation preview-builds within our pull requests. Check them out!

seqan3's People

Contributors

Stargazers

Watchers

Forkers

rrahn gurgese joergi-w smehringer sarahet mariehoffmann xenigmax kneubert eldariont xp3i4 marehr temehi cpockrandt h-2 biocodings qpcr4vir rresta eseiler giesselmann sakuraxdudut joshuak94 mitradarja svnbgnk firstlovelife fvangef tloka clemapfel tolufash jpfeuffer mdcao irallia rhagenson anbratu baiyuanxiang mr-c lyannic dendisuhubdy wvandertoorn sgssgene simonsasse thwelln haoxianglin 5l1v3r1 bfowle kloetzl xuesap yesimon lhovo fw1121 eaasna remyschwab xaleva catx1024 motz61 thoughtsynapse hosseinem jensuweulrich ajunlonglive seqan-actions lutfia95 bkille genostack schaudge gzhoffie baajarmeh smbzhang rnaimehaom wang-kaifei test-runs-tci tsnorri barracuda156 nfc35 calinteodorescu sarvex stan-dale francoiscarouge gerhobbelt cauliyang jhu99 chrisarg

seqan3's Issues

Remove static_asserts from alphabet module

... and double-check that test-cases exist for this and if not, create them!

wiki: document concepts as doxygen interfaces

I good workaround for doxygen not supporting concepts seems to be excluding them from the documentation build via \cond and \nocond and then just defining them via the interface command. It is also possible to define the required functions on this concept as members or related functions of that concept.

Then we can also mark other types as \implements concept_name and have concepts that refine other concept specify \extends other_concept. This should provide for really nice documentation!

Here is an example of the alphabet_concept:

/*!\interface seqan3::alphabet_concept <>
 * \brief The generic alphabet concept that covers most data types used in ranges.
 * \ingroup alphabet
 *
 * The requirements for this concept are given as related functions and metafunctions.
 * Types that satisfy this concept are shown as "implementing this interface".
 */
/*!\fn auto seqan3::to_char(alphabet_concept const c)
 * \brief Returns the alphabet letter's value in character representation.
 * \relates seqan3::alphabet_concept
 * \param c The alphabet letter that you wish to convert to char.
 * \returns The letter's value in the alphabet's char type (seqan3::underlying_char).
 * ...
 */
//!\cond
template <typename t>
concept bool alphabet_concept = requires (t t1, t t2)
{
    // StL concepts
    requires std::is_pod_v<t> == true;
    requires std::is_swappable_v<t> == true;

    // static data members
    alphabet_size<t>::value;

    // conversion to char and rank
    { to_char(t1) } -> underlying_char_t<t>;
    { to_rank(t1) } -> underlying_rank_t<t>;
    { std::cout << t1 };

    // assignment from char and rank
    { assign_char(t1,  0) } -> t &;
    { assign_rank(t1,  0) } -> t &;
    { assign_char(t{}, 0) } -> t &&;
    { assign_rank(t{}, 0) } -> t &&;

    // required comparison operators
    { t1 == t2 } -> bool;
    { t1 != t2 } -> bool;
    { t1 <  t2 } -> bool;
    { t1 >  t2 } -> bool;
    { t1 <= t2 } -> bool;
    { t1 >= t2 } -> bool;
};
//!\endcond

@xenigmax can you do this for the existing concept definitions (alphabet, nucleotide...)? For the general concepts (range, core...) there doesn't need to be any function or requirement definition, but the correct inheritance/extension would be nice.

Quality alphabet TODOs

I thought about the name illumina18 again, and I think it is confusing, because for the other alphabets the number in the end denotes the size. Also the current illumina18 can also represent the original Sanger qualities.

I would therefore propose:

phred42:
- rename of the current illumina18
- phred_type should be changed to uint8_t since it is only positive
- documentation should indicate that it can represent illumina >= 1.8 and original sanger
- phred range is [0, 41] (hence "phred42")
- values larger than 41 should be mapped to 41, also assert values not larger than 62
- offset_phred can be removed
phred63:
- same as phred42
- except that phred range is [0, 62] (hence "phred63")
- only assert values not larger than 62
phred68legacy:
- same as phred63
- except that phred range is [-5, 62] (hence "phred68")
- phred_type is int8_t
- offset_phred is -5
- assert that assigned values not larger than 62 and not smaller than -5
- document that this is for solexa and illumina < 1.8

Also, maybe we can add a small documentation overview to alphabet/quality.hpp that explains the differences...

gaps module clean-up

depends on #58

rename gapped_alphabet<> to gapped<> (it will be gapped<dna4> for example so it is clear¹)
remove set_gap and is_gap since one can now just do = gap::GAP and == gap::GAP in constant time
make gapped<> inherit more stuff from union_composition via using
since it is then only a few lines long, move it to alphabet/gap/gap.hpp
add gap submodule meta-header, either alphabet/gap.hpp or alphabet/gap/all.hpp depending or where the library is then
streamline documentation, in particular add \param documentation to functions and \ingroup to everything
deduplicate the test cases, add to the generic test cases
add noexcept where adequate

if you feel that these issues are not related, feel free to make separate PRs!

Update range-v3 submodule from 0.3.0 to 0.3.5

... and fix new warnings. Also update the documentation files to reflect that we now depend on the newer version!

@sarahet You already did this locally anyways, right? 😉 I found out there actually was a new release 10 days ago so we can bump this!

Argument parser - export of file format (ctd) for KNIME engine

Can be partly adapted by seqan2.

create alphabet/composition submodule

move alphabet/union_alphabet.hpp → alphabet/composition/union_composition.hpp
rename union_alphabet → union_composition
move alphabet/composition.hpp → alphabet/composition/cartesian_composition.hpp
rename alphabet_composition → cartesian_composition
create meta-header alphabet/composition.hpp (or all.hpp depending on where the rest of the library is) that includes both and defines a sub-module

Warning Levels

GCC warning levels

We have

	`-pedantic -Werror -Wall -Wextra`	`-Wconversion`	issues
SeqAn	☑	☐	this!
range-v3	☑	☐	ericniebler/range-v3/issues/680
SDSL	?	☐	xxsds/sdsl-lite/issues/26 xxsds/sdsl-lite/pull/17
cereal	☑	☐	USCiLab/cereal/issues/434
umeSIMD	?	?	@marehr

~~Cereal is less important because it is optional and we include it after seqan headers (so we can selectively disable warnings).~~

Clang

Not relevant, yet. As soon as we have it, we will need -pedantic -Werror -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic

Open issues container module

concepts/iterators.hpp, then change concepts/ranges.hpp to be based on iterators directly?
[ ]

figure out a good way of testing custom views

auto const foobar = ranges::view::transform([] (auto const & c) { return c * 2; });

The problem is that foobar can't be constexpr (some limitation in the ranges library) so it's not easy to test.

Our random access iterator adaptor has broken equality semantics

According to this article the equality comparison between iterator and const_iterator obtained from the same container must be valid.
However, our adaptor fails when comparing iterator with const_iterator.
Furthermore, our tests do test non-standard conform behaviour.
That is the comparison of two iterators over to ranges results in undefined behaviour. (see discussion)

gaps module after clean-up

Fixed in #158: gapped shouldn't implement seqan3::alphabet_concept directly (documentation-wise), because it inherits from union_composition (which does)
Fixed in #158: why is gap deactivated for the generic alphabet_test?
Fixed in #127, #133: also the gap module and the composition module create lots of these warnings:

</home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/>:2: warning: Found unknown command `\group'
</home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/>:2: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/all.hpp:39: warning: Found unknown command `\group'

follow up of #59

concatenated_sequences uses more functionality of a iterator type than required

See:

seqan3/include/seqan3/range/container/concatenated_sequences.hpp

Lines 997 to 1045 in 5404a17

    
           template <typename begin_iterator_type, typename end_iterator_type> 
        
           iterator insert(const_iterator pos, begin_iterator_type first, end_iterator_type last) 
        
           //!\cond 
        
               requires forward_iterator_concept<begin_iterator_type> && 
        
                        compatible_concept<begin_iterator_type, concatenated_sequences> 
        
                        //&& sized_sentinel_concept<end_iterator_type, begin_iterator_type> 
        
           //!\endcond 
        
           { 
        
               auto const pos_as_num = std::distance(cbegin(), pos); 
        
               // TODO SEQAN_UNLIKELY 
        
               if (last - first == 0) 
        
                   return begin() + pos_as_num; 
        
               auto const ilist = ranges::make_iterator_range(first, last, std::distance(first, last)); 
        
               data_delimiters.reserve(data_values.size() + ilist.size()); 
        
               data_delimiters.insert(data_delimiters.cbegin() + pos_as_num, 
        
                                      ilist.size(), 
        
                                      *(data_delimiters.cbegin() + pos_as_num)); 
        
               // adapt delimiters of inserted region 
        
               size_type full_len = 0; 
        
               for (size_type i = 0; i < ilist.size(); ++i, ++first) 
        
               { 
        
                   // constant for sized ranges and/or random access ranges, linear otherwise 
        
                   if constexpr (sized_range_concept<std::decay_t<decltype(*first)>>) 
        
                       full_len += ranges::size(*first); 
        
                   else 
        
                       full_len += std::distance(ranges::begin(*first), ranges::end(*first)); 
        
                   data_delimiters[pos_as_num + 1 + i] += full_len; 
        
               } 
        
               // adapt values of inserted region 
        
               auto concatenated = ilist | ranges::view::join | ranges::view::bounded; 
        
               data_values.reserve(data_values.size() + full_len); 
        
               data_values.insert(data_values.cbegin() + data_delimiters[pos_as_num], 
        
                                  ranges::begin(concatenated), 
        
                                  ranges::end(concatenated)); 
        
               // adapt delimiters behind inserted region 
        
               // TODO parallel execution policy or vectorization? 
        
               std::for_each(data_delimiters.begin() + pos_as_num + ilist.size() + 1, 
        
                             data_delimiters.end(), 
        
                             [full_len] (auto & d) { d += full_len; }); 
        
               return begin() + pos_as_num; 
        
           }

It requires a forward_iterator, but last - first is only available for random_acesss_iterators

We need a test case for this, where we use std::forward_list.

Where to place `test/` utility functions?

#120 introduced /test/include/seqan3_test/tmp_filename.hpp

I would like to discuss where files should life and what the namespace is.

We could do it in the following ways:

put it into /include/seqan3/test/
put it into /include/seqan3_test/
put it into /test/include/seqan3/
put it into /test/include/seqan3_test/

I think the namespace seqan3::test is already good.

Open issues with doxygen

ordered by priority:

need a \complexity keyword for functions
requires statements in function signatures are ugly
user defined literals are completely ignored; or has this changed in a newer version?
concept are treated as template variables, is ugly
explicit constructor attribute not displayed.
concept documentation should be added to/displayed for classes fulfilling this concept
doxygen uses its own style, not the one in code, e.g.

insert (const_iterator pos, value_type const &value)
//                                            ^ there should be a space

pseudoknot_support should have the base class std::false_type

I don't know what pseudoknots are, but the current definition:

template<typename alphabet_type>
struct pseudoknot_support{};

template<typename alphabet_type>
constexpr bool pseudoknot_support_v = pseudoknot_support<alphabet_type>::value;

looks to me as it should be is_pseudoknot and is_pseudoknot_v?

Whatever the name is it should inherit from std::false_type or std::true_type.

Any thoughts?

alphabets using lookup tables may produce code which seg-faults and / or compiler errors in assign_char

A common implementation of assign_char looks like this:

using char_type = char;
//...
constexpr aa27 & assign_char(char_type const c) noexcept
{
    _value = char_to_value[c];
    return *this;
}

char is a signed type and may overflow from 127 -> -128, but that is a negative index access to char_to_value.

This should be easily fixed by changing it to:

using char_type = char;
//...
constexpr aa27 & assign_char(char_type const c) noexcept
{
    using index_t = std::make_unsigned_t<char_type>;
    _value = char_to_value[static_cast<index_t>(c)];
    return *this;
}

alphabet_test should add this corner case.

alphabet is not working for pod's

The following is currently not possible:

static_assert(alphabet_concept<char>);

When trying to overload the basic structures for char

template <>
struct alphabet_size<char>
{
    // the current definition is wrong, if all values of a type is used. We would need the next bigger data type to fit the value 
    // maybe we should change it to the maximum value a type can hold, because we don't need to use a bigger data type than the rank type
    static constexpr uint16_t value = 1 << (sizeof(char)*8);
};

template <>
struct underlying_rank<char>
{
    using type = char;
};

template <>
struct underlying_char<char>
{
    using type = char;
};

I get the error:

alphabet/concept.hpp:271:26: error: template constraint failure
 struct alphabet_size<char>
                          ^
alphabet/concept.hpp:271:26: note:   constraints not satisfied
alphabet/concept.hpp:271:26: note:     with ‘char c’
alphabet/concept.hpp:271:26: note: the required expression ‘alphabet_type:: value_size
’ would be ill-formed

The definition of alphabet_size is

template <typename alphabet_type>
    requires requires (alphabet_type c) { alphabet_type::value_size; }
struct alphabet_size;

So it seems, that you can't specialize structures that use concepts. That means we have to rewrite this?

Add a nice wrapper class to avoid implementing pipe-operator for custom views.

Please note, this is a design question/feature proposal.
Inspired by the nice tutorial about implementing your own ranges in the wiki, I was thinking that it would be great to take the burden off of the developers to always implement the pipe-operator for their custom views.
Especially, as it could be very confusing in the usage if stuff like std::bind together with placeholders is used to implement proper pipe-operator delegation for the respective view.
Hence, I came up with a CRTP pattern, that in general would allow us to implement a very handsome
my_view_fn class where I don't need to think about the implementation of the pipe-operator and these kind of stuff.
Of course some stuff has to be dealt with, like proper const-fication, but in general I think that would be the way to go.
It might be possible, that I wasn't thinking of some corner cases so this should be discussed here, and we need to see if we can update the proposal or if we have to withdraw completely.

[-note-begin-
Of course if the view would have two constructors, while expecting 2 arguments and one of the constructors has a default parameter for the second argument, than there would be probably an ambiguous function call. Then again, the overloaded function call operator of view_fn could handle the default argument and thus making this design still applicable I think.
-end-]

But now enough talking. Lean back and enjoy! 😁

#include <vector>
#include <utility>
#include <memory>
#include <tuple>

namespace seqan3
{

// The factory that delegates the pipe to a custom call using the lhs of a |-operator
// implemented as a CRTP class.
template <typename derived_view_fn_t>
struct view_factory
{
    // A reference to the derived view_fn object.
    derived_view_fn_t & view_fn;

    // Construction.
    view_factory(derived_view_fn_t & derived) : view_fn(derived)
    {}

    // A special functor that gives us a named type for which we can overload the
    // pipe-operator. NOTE: I tried to use a lambda instead, but obviously, what should
    // the signature of the pipe operator look like? It would get two template parameters,
    // making it applicable for everything.
    template <typename... arg_ts>
    struct view_factory_fn
    {
        // Get access to the underlying view_fn.
        derived_view_fn_t & view_fn;

        // Store the variadic values somewhere.
        std::tuple<arg_ts...> member;

        // constructor.
        view_factory_fn(derived_view_fn_t & fn, arg_ts && ...args) :
            view_fn(fn),
            member{std::forward<arg_ts>(args)...}
        {}

        // helper class to unpack the tuple.
        template <typename rng_t, size_t... Is>
        auto explode(rng_t && rng, std::index_sequence<Is...> const &)
        {
            return view_fn(std::forward<rng_t>(rng), std::forward<arg_ts>(std::get<Is>(member))...);
        }

        // the operator called by the pipe-operator.
        auto operator()(auto && lhs_rng)
        {
            return explode(std::forward<decltype(lhs_rng)>(lhs_rng), std::make_index_sequence<sizeof...(arg_ts)>{});
        };
    };

    // the operator called by the derived class to get a proxy for calling the pipe-operator later on
    // without using the weird interface with std::bind.
    template <typename... arg_ts>
    auto operator()(arg_ts && ...args)
    {
        return view_factory_fn<arg_ts...>{view_fn, std::forward<arg_ts>(args)...};
    }

    // the actual magic pipe-operator with getting view_factory_fn on the rhs,
    // so that it does not get to be called for every type.
    template <typename rng_t, typename... arg_ts>
    friend auto operator|(rng_t && lhs, view_factory_fn<arg_ts...> && callable)
    {
        return callable(std::forward<rng_t>(lhs));
    }
};

// Some test view implementation that does not do a lot.
template <typename rng_t = void>
class my_view_impl
{
protected:

    std::shared_ptr<std::decay_t<rng_t>> data{};
    unsigned member1{};
public:
    my_view_impl() = default;

    my_view_impl(unsigned const m1) noexcept : member1(m1)
    {}

    my_view_impl(rng_t && rng, unsigned const m1) noexcept : data{std::make_shared<std::decay_t<rng_t>>(rng)}, member1{m1}
    {}

    unsigned get() const noexcept
    {
        return member1;
    }
};

// That would be an example .._fn implementation using the CRTP pattern,
// to get almost automatically access to the pipe notation.
struct my_view_fn : protected view_factory<my_view_fn>
{
    // the base class.
    using factory_t = view_factory<my_view_fn>;

    // the constructor.
    my_view_fn() : factory_t(*this)
    {};

    // Just tell the compiler there is a suitable operator() overload that apparently
    // does all the magic for us.
    using factory_t::operator();

    // Now specify what should happen if the final operator is called.
    // This allows us to use the functional call design or the pipe notation.
    template <typename rng_t>
    auto operator()(rng_t && rng, unsigned const p1)
    {
        return my_view_impl<rng_t>(std::forward<rng_t>(rng), p1);
    }
};

namespace view
{
// Well the code above is yet to be const-ified.
my_view_fn /*const*/ my_view;
}
}

using namespace seqan3;

// Some general use cases, that should work.
int main()
{
    { // use case 1: functional;
        std::vector<int> vec{{1, 2, 3, 4, 5}};
        auto v = view::my_view(vec, 4u);
        std::cout << v.get()  << "\n"; // should print 4
    }

    {  // use case 2: pipe version.
        std::vector<int> vec{{1, 2, 3, 4, 5}};
        auto v2 = vec | view::my_view(4u) | view::my_view(3u);
        std::cout << v.get()  << "\n"; // should print 3
    }
}

union_composition clean-up: noexcept

This is a detail/discussion ticket of the meta ticket #60

add noexcept where adequate

raise cpp version requirement to C++17 again

... as soon as GCC7 is released and available to all our devs.
See #42 for more details.

where to put the module and submodule meta-includes

Currently we have

alphabet.hpp
alphabet/...

Instead we could:

alphabet/all.hpp
alphabet/...

~~The second is the model used by range-v3~~ and it has some benefits which is why I would suggest switching to it:

better encapsulation (every header that belongs to the (sub)module is in the folder), this also makes doxygen easier to understand because folders are identical with (sub)modules
clear distinction between "meta-headers" and real headers, e.g. by just looking at the files:
```
alphabet/concept.hpp
alphabet/nucleotide.hpp
```
you don't know whether nucleotide.hpp is a meta-header or a real header, without also going through the list of folders.

Travis changes

We need to

switch to a release version of gcc-7 (currently we use an older snapshot) [fixed with #113]
add a build with -DNO_CEREAL=1 so that we check that all code runs with and without cereal
add a build that build user documentation and fails if there is a warning
add a build that build developer documentation and fails if there is a warning

@rrahn and @marehr Can you look into this after the seqan2.4 dust has settled and #108 is resolved?

make core/concept/iterator_detail.hpp a proper test case

/include/seqan3/core/concept/iterator_detail.hpp tests core functionality within the header via static_asserts, this should be transformed into a test case.

Introduce new product development cycle

noexcept version/macro for function that use assert, i.e. only throw in debug mode

I looked into dna4.hpp and found that basically every function, except for assign_rank has noexcept

I assume that is due to assert. But, I thought assert does not throw but SIGABORT? And when compiling with -DNDEBUG this code will NEVER throw/exit. Is there a macro for noexcept if -DNDEBUG?
-- marehr

That's all true. We need a different assert macro that throws, than we can make the noexcept property of assign_rank forward to the noexcept property of the assertion which will be true in release mode and false in debug mode.
-- h-2

See original discussion:

union_composition clean-up

depends on #58

if you feel that these issues are not related, feel free to make separate PRs (i.e. not for everything, but for test VS doc for example) 😄

Bring aminoacid submodule up to state

add alphabet/aminoacid.hpp meta-header and introduce sub-module for aminoacid
remove alphabet/aminoacid/aa27_container.hpp
clean-up alphabet/aminoacid/aa27.hpp
- style-guide, documentation, make statics private, replace table with constexpr lambda...
add aa27 to the generic alphabet tests, reduce the aa27 test to the specific parts

Most of this can be easily adapted from the nucleotide submodule.

@sarahet, since you are working on this anyway, could you update it?

Add test case for seqan3::test::tmp_file_name

Test filename = nullptr
Test if temporary file will be deleted again
Test creating multiple creations of temporary files and if they are deleted again
Should not be copy constructable

deep views

In SeqAn we will need/want recursive views quite often, or more precisely, views that modify the innermost most range, not the outer range.
This is because we often operate on collections of string/vectors, e.g. we could have std::vector<dna4_vector> and want to do view::complement. Of course a vector-of-vector has no complement so we want the view to do this for the inner vectors. In other cases, e.g. reverse it is not so clear, because you could reverse either on the other range, or on the inner range.

I would propose

all views where it is clear, e.g. views that operate on alphabets, should automatically "recurse"; this will be documented!
all other views where we need this often enough, get a foo_r sibling, that has the behaviour, e.g. view::reverse_r

Alternatively, we would never do it automatically, and create _r version for all our views.

@seqan/all what do you think?

documentation of views

view properties needs to be improved so that people understand the consequences
most current views don't have \ingroup view set, yet
remove Complexity
remove Thread safety and Exception?

create seqan3/core/filesystem.hpp

with roughly this content which is currently in test/include/seqan3/test/tmp_file_name.hpp

#if __has_include(<filesystem>)
#include <filesystem>
namespace seqan3::filesystem = std::filesystem;
#else
#include <experimental/filesystem>
namespace seqan3::filesystem = std::experimental::filesystem;
#endif // __has_include(experimental/filesystem)

Thereby we can just include <seqan3/core/filesystem.hpp> instead of doing a check for the header and also we can use our sub-namespace instead of doing a check for the namespace.

Also add a check to platform.hpp that either or <experimental/filesystem> need to exist.

Export code snippets into files that can be tested

As done in seqan2, maybe we should not hardcode cpp example code in the documentation but instead include separate files. Those demo files might then be tested by the testing framework and this ensures that the cpp examples always work.

@xenigmax Does Doxygen support something like \include mycode.cpp"

@seqan/all

Write wiki entry for HowTo write Doxygen

Some of the fixes of the documentation seem not trivial.
It would be good to add a good practice documentation on the developer wiki, so that others do not run into the same issues as a follow up for #127.

Known issues

issue with \relates statement
issue unresolvable references, e.g. typename seqan3::underlying_char<alphabet_type>::type
\defgroup gap ➡️ \defgroup gap Gap
operator "" _aa27s ➡️ operator""_aa27s

union_composition clean-up: documentation

This is a detail/discussion ticket of the meta ticket #60

add \param documentation to all functions
don't use namespace seqan3::detail::union_alphabet since it doesn't appear in doxygen
don't use \privatesection and \publicsection outside of class scope (it doesn't have any effect there)
add \ingroup composition where needed!

Support concept for doxygen.

Hello.

There are 4 things regarding Concept according to the issue #72.

Use cond syntax to let Doxygen skip parsing Concept part. (since it's not able to parse them)
https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdcond
Use interface command to manualy link Concepts to the document (it will be shown as C++ interface)
http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdinterface
Use implement and extend commands to manually show inheritance information of Concepts.
https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdimplements
https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdextends
More explanation regarding functions defined in the concept block. (They will not be shown by 1 )

You can check datailed explaination in #72.

I tried to do that by myself as requested but found that it's not only a burden for me but also very risky.
Unfortunately, I was not involving in many design discussions and not easy to figure out all the inheritance relationships between concepts.

Instead, I listed up all the source codes contain 'concept' by grep, and made a list with author names. Please understand that I'm not able to look at all these codes and figure out how to link them correctly.
Perhaps, it's not a big deal for original authors (but not me..). It does not require any hacks and infrastructure-related issues. Please just change some comments that you wrote and claimed as authors when you have a time.

Since the list is made by the grep, there can be some files that does not require any modification. If it's the case, please just check the checkbox. And please let me know regarding incorrect author assignments.

Sara @sarahet

alphabet/nucleotide/nucl16.hpp
alphabet/aminoacid/aa27.hpp

David & Hannes @eldariont

Marcel & David @marehr

alphabet/gap/gap.hpp
alphabet/gap/gapped_alphabet.hpp
alphabet/composition/union_composition.hpp
alphabet/composition/cartesian_composition.hpp

Rene @rrahn

Joerg @joergi-w

io/sequence/sequence_file_format.hpp

Hannes @h-2

@mariehoffmann

alphabet/quality/composition.hpp
alphabet/quality/concept.hpp
alphabet/quality/aliases.hpp
alphabet/quality/illumina18.hpp
alphabet/quality.hpp

No author (probably not completed yet)

io/alignment/align_file_in.hpp
io/alignment/align_file.hpp
io/alignment/align_file_out.hpp
io/alignment/align_file_detail.hpp
io/sequence/sequence_file_in.hpp
io/sequence/sequence_file_format_fasta.hpp

Remove string typedefs and literals from alphabet module

remove things like dna4_string and operator""_dna4s, because we just want people to use dan4_vector.

Difficulty: easy!

<=> operator makes <, >, <=, >=, ==, != redundant

We can reduce a lot of code within the alphabet module. Because most alphabets are trivial comparable.

http://en.cppreference.com/w/cpp/language/default_comparisons:

Provides a way to request the compiler to generate consistent relational operators for a class.

In brief, a class that defines operator<=> automatically gets compiler-generated operators ==, !=, <, <=, >, and >=. A class can define operator<=> as defaulted, in which case the compiler will also generate the code for that operator.

class Point {
 int x;
 int y;
public:
 auto operator<=>(const Point&) const = default;
 // ... non-comparison functions ...
};
 
Point pt1, pt2;
if (pt1 == pt2) { /*...*/ } // ok
set<Point> s; // ok
s.insert(pt1); // ok
if (pt1 <= pt2) { /*...*/ } // ok, makes only a single call to <=>

add soft dependency on LEMON

Doxygen \file should be empty if it is in the same file

Currently we use mostly the \file <filepath> syntax.

For example in file /include/seqan3/alphabet/nucleotide/rna4.hpp

/*!\file alphabet/nucleotide/rna4.hpp
 * [...]
 */

which is basically the absolute path without the /include/seqan3/ part.

In /include/seqan3/alphabet/range.hpp and /include/seqan3/core/platform.hpp we only use the filename itself.

/*!\file platform.hpp
 * [...]
 */

I propose to use the \file syntax (without <filepath>), because according to https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdfile:

If the file name is omitted (i.e. the line after \file is left blank) then the documentation block that contains the \file command will belong to the file it is located in.

Manual testing showed that it will produce the same doc as by providing a <filepath>.

A pro argument is:

You can't forget to rename the <filepath> part, after renaming the actual file

A contra argument:

If you see \file alphabet/nucleotide/rna4.hpp - even if you don't know what \file means - you can kind of guess that the following documentation is only for the file \file alphabet/nucleotide/rna4.hpp, whereas \file may not be descriptive enough

Discuss whether to give up the seqan3::literal namespace

We originally chose to move our literals into seqan3::literal because the STL does so also. For the STL the reasoning is that many people do using namespace std; and then the std literals might conflict with others.
For SeqAn this is not true from my POV. I don't think many people will do
using namespace seqan3; and also using namespace other_sequencelib; that by chance overloads the same literals as we do. In fact we lose nothing by just not supporting this setup.

And it would improve usability for our users, because they can call "ACGT"_dna4 without having to include an extra namespace.

don't force tests to be in release mode and activate warnings

remove line 7 from test/CMakeLists.txt so that tests can be built with DEBUG.
add ~~-Wall -Wshorten-64-to-32~~ -Wall -Wconversion to the CMAKE_CXX_FLAGS
fix the warnings you then encounter :)

@cpockrandt since you have been absent from development for a while: this will give you a good overview over the current codebase 😉

Discuss whether to give up the <alphabet>_vector aliases

Should we remove the type alias dna4_vector, dna5_vector,...?

Observations:

We don't need those type aliases within the library code itself, because we will always assume containers of alphabet.
They are simply convenient for our users and for our example codes

Pro arguments for removing:

They are not much shorter than std::vector<dna4>
They don't obscure the true type behind the <alphabet>_vector alias (easier for beginners to understand)
No (future) developer will assume that the backend of the container is always vector

Pro argument for keeping:

It is convenient
It enforces a certain structure on our users (they will adopt to use dna4_vector from our examples)
...

@rrahn commented on #146 (review):

The main obstacle IMPOV is that we should drop the dna|rna alias and maybe also the dna\d+_vector as there seems to be no benefit over std::vector<dna\d+>, does it? But we can discuss this offline.

@h-2 said on #146 (comment):

Hm, I would prefer not to. It's really long. Maybe even something shorter, like dna4vec? But that would go against our principle of not abbreviating... 🤔
I really want something short, though, because it is written often.

union_composition clean-up: tests

This is a detail/discussion ticket of the meta ticket #60

remove redundancy, add to generic test cases
don't test _value
tests may only depend on previously tested behaviour, there must be no circular dependencies
don't use static_asserts, instead use TESTs

Question: For the Dna alphabet, is ACGT stored in 4 bytes or one?

I am a newbie.

Is ACGT stored in 4 bytes or one?

If I have an alphabet of size 2^k, I want to store them is k bits. In particular, I want to be able to pack them into as few bytes as possible. I would also like k to be specified at run time.

Where do I start looking at the code if the answer to the above question is yes.

Mark Stankus

make core/concept/core_detail.hpp a proper test case

/include/seqan3/core/concept/core_detail.hpp tests core functionality within the header via static_asserts, this should be transformed into a test case.

Doxygen warnings

We really need to get rid of the warnings so that we a have clean state once and for all.

This is the list of files and responsible people:

@h-2:

seqan3/include/seqan3/alphabet/adaptation/all.hpp
seqan3/include/seqan3/alphabet/concept_pre.hpp
seqan3/include/seqan3/alphabet/nucleotide/all.hpp
seqan3/include/seqan3/alphabet/nucleotide/dna4.hpp
seqan3/include/seqan3/alphabet/nucleotide/dna5.hpp
seqan3/include/seqan3/alphabet/nucleotide/nucl16.hpp
`seqan3/include/seqan3/alphabet/nucleotide/rna4.hpp``
seqan3/include/seqan3/alphabet/nucleotide/rna5.hpp
seqan3/include/seqan3/alphabet/all.hpp

@rrahn:

seqan3/include/seqan3/core/concept/core_detail.hpp
seqan3/include/seqan3/core/concept/iterator_detail.hpp

@sarahet:

seqan3/include/seqan3/alphabet/aminoacid/aa27.hpp
seqan3/include/seqan3/alphabet/aminoacid/all.hpp

@marehr:

seqan3/include/seqan3/alphabet/composition/all.hpp
seqan3/include/seqan3/alphabet/gap/all.hpp
seqan3/include/seqan3/alphabet/gap/gap.hpp

@mariehoffmann:

seqan3/include/seqan3/alphabet/quality/concept.hpp
seqan3/include/seqan3/alphabet/quality/illumina18.hpp
seqan3/include/seqan3/container.hpp

These files should be excluded from doxygen for now:

seqan3/include/seqan3/io/alignment/align_file_detail.hpp
seqan3/include/seqan3/io/alignment/align_file_in.hpp
seqan3/include/seqan3/io/alignment/align_file.hpp
seqan3/include/seqan3/io/sequence/sequence_file_format.hpp
seqan3/include/seqan3/io/sequence/sequence_file_in.hpp

Add proper file I/O Exceptions

We to port some file i/o exceptions.

FileOpenError - error indicating that a file could not opened.
ParseError - error indicating that parsing the file failed.

structure alphabets

I think we should add the structure formats that are alphabets also to the alphabet module.

The following would make sense from my POV:

alphabet/structure/all.hpp
alphabet/structure/concept.hpp
alphabet/structure/dot_bracket3.hpp    // rna: . ( )
alphabet/structure/wuss18.hpp    // rna: .,;<>(){}[]AaBb.-_
alphabet/structure/dssp9.hpp   // protein: HGIEBTSCX

Wuss could probably be bigger as well since all character pairs are allowed, I think...

@joergi-w What do you think? Can you work on this?

Make all seqan3 objects "streamable".

I think alphabet strings like dna4_string should be streamable, since there are several use cases.

Next to the obvious std::cout << my_dna_string, I might also want to stream into a dna4_string via an std::istringstream. In the case (more unlikely, I know) that a dna4_string is a user defined command line input (maybe "give a promotor sequence"), this is currently not possible because I require option input types to be std::istringstream convertable.

@h-2, @joergi-w, @marehr what do you think? (as git blames you as authors of the alphabet module)

	template <typename begin_iterator_type, typename end_iterator_type>
	iterator insert(const_iterator pos, begin_iterator_type first, end_iterator_type last)
	//!\cond
	requires forward_iterator_concept<begin_iterator_type> &&
	compatible_concept<begin_iterator_type, concatenated_sequences>
	//&& sized_sentinel_concept<end_iterator_type, begin_iterator_type>
	//!\endcond
	{
	auto const pos_as_num = std::distance(cbegin(), pos);
	// TODO SEQAN_UNLIKELY
	if (last - first == 0)
	return begin() + pos_as_num;

	auto const ilist = ranges::make_iterator_range(first, last, std::distance(first, last));

	data_delimiters.reserve(data_values.size() + ilist.size());
	data_delimiters.insert(data_delimiters.cbegin() + pos_as_num,
	ilist.size(),
	*(data_delimiters.cbegin() + pos_as_num));


	// adapt delimiters of inserted region
	size_type full_len = 0;
	for (size_type i = 0; i < ilist.size(); ++i, ++first)
	{
	// constant for sized ranges and/or random access ranges, linear otherwise
	if constexpr (sized_range_concept<std::decay_t<decltype(*first)>>)
	full_len += ranges::size(*first);
	else
	full_len += std::distance(ranges::begin(first), ranges::end(first));

	data_delimiters[pos_as_num + 1 + i] += full_len;
	}

	// adapt values of inserted region
	auto concatenated = ilist \| ranges::view::join \| ranges::view::bounded;
	data_values.reserve(data_values.size() + full_len);
	data_values.insert(data_values.cbegin() + data_delimiters[pos_as_num],
	ranges::begin(concatenated),
	ranges::end(concatenated));

	// adapt delimiters behind inserted region
	// TODO parallel execution policy or vectorization?
	std::for_each(data_delimiters.begin() + pos_as_num + ilist.size() + 1,
	data_delimiters.end(),
	[full_len] (auto & d) { d += full_len; });

	return begin() + pos_as_num;
	}

seqan / seqan3 Goto Github PK

seqan3's Introduction

SeqAn3 -- the modern C++ library for sequence analysis

Quick facts

Dependencies

Usage

Sponsorships

seqan3's People

Contributors

Stargazers

Watchers

Forkers

seqan3's Issues

GCC warning levels

Clang

Known issues

Recommend Projects

Recommend Topics

Recommend Org