Giter Site home page Giter Site logo

seqan / seqan3 Goto Github PK

View Code? Open in Web Editor NEW
389.0 24.0 80.0 21.12 MB

The modern C++ library for sequence analysis. Contains version 3 of the library and API docs.

Home Page: https://www.seqan.de

License: Other

C++ 97.76% CMake 2.07% Perl 0.03% Shell 0.13%
sequence-analysis seqan cpp17 cpp20 cpp-concepts bioinformatics blast sequence-alignment fasta fastq

seqan3's Introduction

SeqAn3 -- the modern C++ library for sequence analysis

build status codecov license latest release platforms start twitter

SeqAn3 is the new version of the popular SeqAn template library for the analysis of biological sequences. It enables the rapid development of high-performance solutions by providing generic algorithms and data structures for:

  • sequence representation and transformation
  • full-text indexing and efficient search
  • sequence alignment
  • input/output of common file formats

By leveraging Modern C++ it provides unprecedented ease-of-use without sacrificing performance.

Please see the online documentation for more details.

Quick facts

  • C++ header-only library: easy to integrate with your app & easy to distribute
  • liberal open source license: allows integration with any app or library, requires only attribution
  • very high code quality standards: >97% unit test coverage, performance regression tests, ...
  • extensive API documentation & tutorials: more lines of documentation than lines of code
  • aims to support any 64-bit architecture running Linux/POSIX; currently big-endian CPU architectures like s390x are less supported

Dependencies

requirement version comment
compiler GCC ≥ 11 no other compiler is currently supported!
build system CMake ≥ 3.5 optional, but recommended
required libs SDSL ≥ 3.0.3
optional libs cereal ≥ 1.3.1 required for serialisation and CTD support
zlib ≥ 1.2 required for *.gz and .bam file support
bzip2 ≥ 1.0 required for *.bz2 file support

Usage

We recommend that you use CMake to build your project:

  • Setup-Tutorial
  • Using CMake guarantees that all optional dependencies are automatically detected and activated.

Quick-Setup without CMake:

  • Clone the repository with submodules: git clone --recurse-submodules https://github.com/seqan/seqan3.git
  • Add the following to your compiler invocation:
    • the include directories of SeqAn and its dependencies
    • C++20 mode
    • Macros indicating the presence of zlib and bzip2 (set only if actually available in your paths!)
  • The command could look like this:
g++-11 -O3 -DNDEBUG -Wall -Wextra                               \
    -std=c++20                                                  \
    -I       /path/to/seqan3/include                            \
    -isystem /path/to/seqan3/submodules/sdsl-lite/include       \
    -isystem /path/to/seqan3/submodules/cereal/include          \
    -DSEQAN3_HAS_ZLIB=1 -DSEQAN3_HAS_BZIP2=1                    \
    -lz -lbz2 -pthread                                          \
  your_file.cpp

Sponsorships

Vercel

Vercel is kind enough to sponsor our documentation preview-builds within our pull requests. Check them out!

seqan3's People

Contributors

clemapfel avatar cpockrandt avatar dependabot[bot] avatar eaasna avatar eldariont avatar eseiler avatar h-2 avatar irallia avatar jensuweulrich avatar joergi-w avatar joshuak94 avatar lhovo avatar marehr avatar mariehoffmann avatar mitradarja avatar mr-c avatar pirovc avatar qasim-at-tci avatar qpcr4vir avatar remyschwab avatar rrahn avatar sarahet avatar seqan-actions avatar sgssgene avatar simonsasse avatar smehringer avatar svnbgnk avatar tloka avatar tsnorri avatar wvandertoorn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seqan3's Issues

wiki: document concepts as doxygen interfaces

I good workaround for doxygen not supporting concepts seems to be excluding them from the documentation build via \cond and \nocond and then just defining them via the interface command. It is also possible to define the required functions on this concept as members or related functions of that concept.

Then we can also mark other types as \implements concept_name and have concepts that refine other concept specify \extends other_concept. This should provide for really nice documentation!

Here is an example of the alphabet_concept:

/*!\interface seqan3::alphabet_concept <>
 * \brief The generic alphabet concept that covers most data types used in ranges.
 * \ingroup alphabet
 *
 * The requirements for this concept are given as related functions and metafunctions.
 * Types that satisfy this concept are shown as "implementing this interface".
 */
/*!\fn auto seqan3::to_char(alphabet_concept const c)
 * \brief Returns the alphabet letter's value in character representation.
 * \relates seqan3::alphabet_concept
 * \param c The alphabet letter that you wish to convert to char.
 * \returns The letter's value in the alphabet's char type (seqan3::underlying_char).
 * ...
 */
//!\cond
template <typename t>
concept bool alphabet_concept = requires (t t1, t t2)
{
    // StL concepts
    requires std::is_pod_v<t> == true;
    requires std::is_swappable_v<t> == true;

    // static data members
    alphabet_size<t>::value;

    // conversion to char and rank
    { to_char(t1) } -> underlying_char_t<t>;
    { to_rank(t1) } -> underlying_rank_t<t>;
    { std::cout << t1 };

    // assignment from char and rank
    { assign_char(t1,  0) } -> t &;
    { assign_rank(t1,  0) } -> t &;
    { assign_char(t{}, 0) } -> t &&;
    { assign_rank(t{}, 0) } -> t &&;

    // required comparison operators
    { t1 == t2 } -> bool;
    { t1 != t2 } -> bool;
    { t1 <  t2 } -> bool;
    { t1 >  t2 } -> bool;
    { t1 <= t2 } -> bool;
    { t1 >= t2 } -> bool;
};
//!\endcond

@xenigmax can you do this for the existing concept definitions (alphabet, nucleotide...)? For the general concepts (range, core...) there doesn't need to be any function or requirement definition, but the correct inheritance/extension would be nice.

Quality alphabet TODOs

I thought about the name illumina18 again, and I think it is confusing, because for the other alphabets the number in the end denotes the size. Also the current illumina18 can also represent the original Sanger qualities.

I would therefore propose:

  • phred42:

    • rename of the current illumina18
    • phred_type should be changed to uint8_t since it is only positive
    • documentation should indicate that it can represent illumina >= 1.8 and original sanger
    • phred range is [0, 41] (hence "phred42")
    • values larger than 41 should be mapped to 41, also assert values not larger than 62
    • offset_phred can be removed
  • phred63:

    • same as phred42
    • except that phred range is [0, 62] (hence "phred63")
    • only assert values not larger than 62
  • phred68legacy:

    • same as phred63
    • except that phred range is [-5, 62] (hence "phred68")
    • phred_type is int8_t
    • offset_phred is -5
    • assert that assigned values not larger than 62 and not smaller than -5
    • document that this is for solexa and illumina < 1.8

Also, maybe we can add a small documentation overview to alphabet/quality.hpp that explains the differences...

gaps module clean-up

depends on #58

  • rename gapped_alphabet<> to gapped<> (it will be gapped<dna4> for example so it is clear¹)
  • remove set_gap and is_gap since one can now just do = gap::GAP and == gap::GAP in constant time
  • make gapped<> inherit more stuff from union_composition via using
  • since it is then only a few lines long, move it to alphabet/gap/gap.hpp
  • add gap submodule meta-header, either alphabet/gap.hpp or alphabet/gap/all.hpp depending or where the library is then
  • streamline documentation, in particular add \param documentation to functions and \ingroup to everything
  • deduplicate the test cases, add to the generic test cases
  • add noexcept where adequate

if you feel that these issues are not related, feel free to make separate PRs!

Update range-v3 submodule from 0.3.0 to 0.3.5

... and fix new warnings. Also update the documentation files to reflect that we now depend on the newer version!

@sarahet You already did this locally anyways, right? 😉 I found out there actually was a new release 10 days ago so we can bump this!

create alphabet/composition submodule

  • move alphabet/union_alphabet.hppalphabet/composition/union_composition.hpp
  • rename union_alphabetunion_composition
  • move alphabet/composition.hppalphabet/composition/cartesian_composition.hpp
  • rename alphabet_compositioncartesian_composition
  • create meta-header alphabet/composition.hpp (or all.hpp depending on where the rest of the library is) that includes both and defines a sub-module

Warning Levels

GCC warning levels

We have

-pedantic -Werror -Wall -Wextra -Wconversion issues
SeqAn this!
range-v3 ericniebler/range-v3/issues/680
SDSL ? xxsds/sdsl-lite/issues/26 xxsds/sdsl-lite/pull/17
cereal USCiLab/cereal/issues/434
umeSIMD ? ? @marehr

Cereal is less important because it is optional and we include it after seqan headers (so we can selectively disable warnings).

Clang

Not relevant, yet. As soon as we have it, we will need -pedantic -Werror -Weverything -Wno-c++98-compat -Wno-c++98-compat-pedantic

figure out a good way of testing custom views

auto const foobar = ranges::view::transform([] (auto const & c) { return c * 2; });

The problem is that foobar can't be constexpr (some limitation in the ranges library) so it's not easy to test.

Our random access iterator adaptor has broken equality semantics

According to this article the equality comparison between iterator and const_iterator obtained from the same container must be valid.
However, our adaptor fails when comparing iterator with const_iterator.
Furthermore, our tests do test non-standard conform behaviour.
That is the comparison of two iterators over to ranges results in undefined behaviour. (see discussion)

gaps module after clean-up

  • Fixed in #158: gapped shouldn't implement seqan3::alphabet_concept directly (documentation-wise), because it inherits from union_composition (which does)
  • Fixed in #158: why is gap deactivated for the generic alphabet_test?
  • Fixed in #127, #133: also the gap module and the composition module create lots of these warnings:
</home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/>:2: warning: Found unknown command `\group'
</home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/>:2: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/composition/all.hpp:39: warning: Found unknown command `\group'
/home/mi/h4nn3s/devel/seqan3/include/seqan3/alphabet/gap/all.hpp:39: warning: Found unknown command `\group'

follow up of #59

concatenated_sequences uses more functionality of a iterator type than required

See:

template <typename begin_iterator_type, typename end_iterator_type>
iterator insert(const_iterator pos, begin_iterator_type first, end_iterator_type last)
//!\cond
requires forward_iterator_concept<begin_iterator_type> &&
compatible_concept<begin_iterator_type, concatenated_sequences>
//&& sized_sentinel_concept<end_iterator_type, begin_iterator_type>
//!\endcond
{
auto const pos_as_num = std::distance(cbegin(), pos);
// TODO SEQAN_UNLIKELY
if (last - first == 0)
return begin() + pos_as_num;
auto const ilist = ranges::make_iterator_range(first, last, std::distance(first, last));
data_delimiters.reserve(data_values.size() + ilist.size());
data_delimiters.insert(data_delimiters.cbegin() + pos_as_num,
ilist.size(),
*(data_delimiters.cbegin() + pos_as_num));
// adapt delimiters of inserted region
size_type full_len = 0;
for (size_type i = 0; i < ilist.size(); ++i, ++first)
{
// constant for sized ranges and/or random access ranges, linear otherwise
if constexpr (sized_range_concept<std::decay_t<decltype(*first)>>)
full_len += ranges::size(*first);
else
full_len += std::distance(ranges::begin(*first), ranges::end(*first));
data_delimiters[pos_as_num + 1 + i] += full_len;
}
// adapt values of inserted region
auto concatenated = ilist | ranges::view::join | ranges::view::bounded;
data_values.reserve(data_values.size() + full_len);
data_values.insert(data_values.cbegin() + data_delimiters[pos_as_num],
ranges::begin(concatenated),
ranges::end(concatenated));
// adapt delimiters behind inserted region
// TODO parallel execution policy or vectorization?
std::for_each(data_delimiters.begin() + pos_as_num + ilist.size() + 1,
data_delimiters.end(),
[full_len] (auto & d) { d += full_len; });
return begin() + pos_as_num;
}

It requires a forward_iterator, but last - first is only available for random_acesss_iterators

We need a test case for this, where we use std::forward_list.

Where to place `test/` utility functions?

#120 introduced /test/include/seqan3_test/tmp_filename.hpp

I would like to discuss where files should life and what the namespace is.

We could do it in the following ways:

  • put it into /include/seqan3/test/
  • put it into /include/seqan3_test/
  • put it into /test/include/seqan3/
  • put it into /test/include/seqan3_test/

I think the namespace seqan3::test is already good.

Open issues with doxygen

ordered by priority:

  • need a \complexity keyword for functions
  • requires statements in function signatures are ugly
  • user defined literals are completely ignored; or has this changed in a newer version?
  • concept are treated as template variables, is ugly
  • explicit constructor attribute not displayed.
  • concept documentation should be added to/displayed for classes fulfilling this concept
  • doxygen uses its own style, not the one in code, e.g.
insert (const_iterator pos, value_type const &value)
//                                            ^ there should be a space 

pseudoknot_support should have the base class std::false_type

I don't know what pseudoknots are, but the current definition:

template<typename alphabet_type>
struct pseudoknot_support{};

template<typename alphabet_type>
constexpr bool pseudoknot_support_v = pseudoknot_support<alphabet_type>::value;

looks to me as it should be is_pseudoknot and is_pseudoknot_v?

Whatever the name is it should inherit from std::false_type or std::true_type.

Any thoughts?

alphabets using lookup tables may produce code which seg-faults and / or compiler errors in assign_char

A common implementation of assign_char looks like this:

using char_type = char;
//...
constexpr aa27 & assign_char(char_type const c) noexcept
{
    _value = char_to_value[c];
    return *this;
}

char is a signed type and may overflow from 127 -> -128, but that is a negative index access to char_to_value.

This should be easily fixed by changing it to:

using char_type = char;
//...
constexpr aa27 & assign_char(char_type const c) noexcept
{
    using index_t = std::make_unsigned_t<char_type>;
    _value = char_to_value[static_cast<index_t>(c)];
    return *this;
}

alphabet_test should add this corner case.

alphabet is not working for pod's

The following is currently not possible:

static_assert(alphabet_concept<char>);

When trying to overload the basic structures for char

template <>
struct alphabet_size<char>
{
    // the current definition is wrong, if all values of a type is used. We would need the next bigger data type to fit the value 
    // maybe we should change it to the maximum value a type can hold, because we don't need to use a bigger data type than the rank type
    static constexpr uint16_t value = 1 << (sizeof(char)*8);
};

template <>
struct underlying_rank<char>
{
    using type = char;
};

template <>
struct underlying_char<char>
{
    using type = char;
};

I get the error:

alphabet/concept.hpp:271:26: error: template constraint failure
 struct alphabet_size<char>
                          ^
alphabet/concept.hpp:271:26: note:   constraints not satisfied
alphabet/concept.hpp:271:26: note:     with ‘char c’
alphabet/concept.hpp:271:26: note: the required expression ‘alphabet_type:: value_size
’ would be ill-formed

The definition of alphabet_size is

template <typename alphabet_type>
    requires requires (alphabet_type c) { alphabet_type::value_size; }
struct alphabet_size;

So it seems, that you can't specialize structures that use concepts. That means we have to rewrite this?

Add a nice wrapper class to avoid implementing pipe-operator for custom views.

Please note, this is a design question/feature proposal.
Inspired by the nice tutorial about implementing your own ranges in the wiki, I was thinking that it would be great to take the burden off of the developers to always implement the pipe-operator for their custom views.
Especially, as it could be very confusing in the usage if stuff like std::bind together with placeholders is used to implement proper pipe-operator delegation for the respective view.
Hence, I came up with a CRTP pattern, that in general would allow us to implement a very handsome
my_view_fn class where I don't need to think about the implementation of the pipe-operator and these kind of stuff.
Of course some stuff has to be dealt with, like proper const-fication, but in general I think that would be the way to go.
It might be possible, that I wasn't thinking of some corner cases so this should be discussed here, and we need to see if we can update the proposal or if we have to withdraw completely.

[-note-begin-
Of course if the view would have two constructors, while expecting 2 arguments and one of the constructors has a default parameter for the second argument, than there would be probably an ambiguous function call. Then again, the overloaded function call operator of view_fn could handle the default argument and thus making this design still applicable I think.
-end-]

But now enough talking. Lean back and enjoy! 😁

#include <vector>
#include <utility>
#include <memory>
#include <tuple>

namespace seqan3
{

// The factory that delegates the pipe to a custom call using the lhs of a |-operator
// implemented as a CRTP class.
template <typename derived_view_fn_t>
struct view_factory
{
    // A reference to the derived view_fn object.
    derived_view_fn_t & view_fn;

    // Construction.
    view_factory(derived_view_fn_t & derived) : view_fn(derived)
    {}

    // A special functor that gives us a named type for which we can overload the
    // pipe-operator. NOTE: I tried to use a lambda instead, but obviously, what should
    // the signature of the pipe operator look like? It would get two template parameters,
    // making it applicable for everything.
    template <typename... arg_ts>
    struct view_factory_fn
    {
        // Get access to the underlying view_fn.
        derived_view_fn_t & view_fn;

        // Store the variadic values somewhere.
        std::tuple<arg_ts...> member;

        // constructor.
        view_factory_fn(derived_view_fn_t & fn, arg_ts && ...args) :
            view_fn(fn),
            member{std::forward<arg_ts>(args)...}
        {}

        // helper class to unpack the tuple.
        template <typename rng_t, size_t... Is>
        auto explode(rng_t && rng, std::index_sequence<Is...> const &)
        {
            return view_fn(std::forward<rng_t>(rng), std::forward<arg_ts>(std::get<Is>(member))...);
        }

        // the operator called by the pipe-operator.
        auto operator()(auto && lhs_rng)
        {
            return explode(std::forward<decltype(lhs_rng)>(lhs_rng), std::make_index_sequence<sizeof...(arg_ts)>{});
        };
    };

    // the operator called by the derived class to get a proxy for calling the pipe-operator later on
    // without using the weird interface with std::bind.
    template <typename... arg_ts>
    auto operator()(arg_ts && ...args)
    {
        return view_factory_fn<arg_ts...>{view_fn, std::forward<arg_ts>(args)...};
    }

    // the actual magic pipe-operator with getting view_factory_fn on the rhs,
    // so that it does not get to be called for every type.
    template <typename rng_t, typename... arg_ts>
    friend auto operator|(rng_t && lhs, view_factory_fn<arg_ts...> && callable)
    {
        return callable(std::forward<rng_t>(lhs));
    }
};

// Some test view implementation that does not do a lot.
template <typename rng_t = void>
class my_view_impl
{
protected:

    std::shared_ptr<std::decay_t<rng_t>> data{};
    unsigned member1{};
public:
    my_view_impl() = default;

    my_view_impl(unsigned const m1) noexcept : member1(m1)
    {}

    my_view_impl(rng_t && rng, unsigned const m1) noexcept : data{std::make_shared<std::decay_t<rng_t>>(rng)}, member1{m1}
    {}

    unsigned get() const noexcept
    {
        return member1;
    }
};

// That would be an example .._fn implementation using the CRTP pattern,
// to get almost automatically access to the pipe notation.
struct my_view_fn : protected view_factory<my_view_fn>
{
    // the base class.
    using factory_t = view_factory<my_view_fn>;

    // the constructor.
    my_view_fn() : factory_t(*this)
    {};

    // Just tell the compiler there is a suitable operator() overload that apparently
    // does all the magic for us.
    using factory_t::operator();

    // Now specify what should happen if the final operator is called.
    // This allows us to use the functional call design or the pipe notation.
    template <typename rng_t>
    auto operator()(rng_t && rng, unsigned const p1)
    {
        return my_view_impl<rng_t>(std::forward<rng_t>(rng), p1);
    }
};

namespace view
{
// Well the code above is yet to be const-ified.
my_view_fn /*const*/ my_view;
}
}

using namespace seqan3;

// Some general use cases, that should work.
int main()
{
    { // use case 1: functional;
        std::vector<int> vec{{1, 2, 3, 4, 5}};
        auto v = view::my_view(vec, 4u);
        std::cout << v.get()  << "\n"; // should print 4
    }

    {  // use case 2: pipe version.
        std::vector<int> vec{{1, 2, 3, 4, 5}};
        auto v2 = vec | view::my_view(4u) | view::my_view(3u);
        std::cout << v.get()  << "\n"; // should print 3
    }
}

where to put the module and submodule meta-includes

Currently we have

alphabet.hpp
alphabet/...

Instead we could:

alphabet/all.hpp
alphabet/...

The second is the model used by range-v3 and it has some benefits which is why I would suggest switching to it:

  • better encapsulation (every header that belongs to the (sub)module is in the folder), this also makes doxygen easier to understand because folders are identical with (sub)modules
  • clear distinction between "meta-headers" and real headers, e.g. by just looking at the files:
    alphabet/concept.hpp
    alphabet/nucleotide.hpp
    
    you don't know whether nucleotide.hpp is a meta-header or a real header, without also going through the list of folders.

Travis changes

We need to

  • switch to a release version of gcc-7 (currently we use an older snapshot) [fixed with #113]
  • add a build with -DNO_CEREAL=1 so that we check that all code runs with and without cereal
  • add a build that build user documentation and fails if there is a warning
  • add a build that build developer documentation and fails if there is a warning

@rrahn and @marehr Can you look into this after the seqan2.4 dust has settled and #108 is resolved?

noexcept version/macro for function that use assert, i.e. only throw in debug mode

I looked into dna4.hpp and found that basically every function, except for assign_rank has noexcept

I assume that is due to assert. But, I thought assert does not throw but SIGABORT? And when compiling with -DNDEBUG this code will NEVER throw/exit. Is there a macro for noexcept if -DNDEBUG?
-- marehr

That's all true. We need a different assert macro that throws, than we can make the noexcept property of assign_rank forward to the noexcept property of the assertion which will be true in release mode and false in debug mode.
-- h-2

See original discussion:

union_composition clean-up

depends on #58

  • #68, streamline documentation, in particular:
    • add \param documentation to all functions
    • don't use namespace seqan3::detail::union_alphabet since it doesn't appear in doxygen
    • don't use \privatesection and \publicsection outside of class scope (it doesn't have any effect there)
    • add \ingroup composition where needed!
  • #69, streamline tests, in particular
    • remove redundancy, add to generic test cases
    • don't test _value
    • tests may only depend on previously tested behaviour, there must be no circular dependencies
    • don't use static_asserts, instead use TESTs
  • #70, add noexcept where adequate

if you feel that these issues are not related, feel free to make separate PRs (i.e. not for everything, but for test VS doc for example) 😄

Bring aminoacid submodule up to state

  • add alphabet/aminoacid.hpp meta-header and introduce sub-module for aminoacid
  • remove alphabet/aminoacid/aa27_container.hpp
  • clean-up alphabet/aminoacid/aa27.hpp
    • style-guide, documentation, make statics private, replace table with constexpr lambda...
  • add aa27 to the generic alphabet tests, reduce the aa27 test to the specific parts

Most of this can be easily adapted from the nucleotide submodule.

@sarahet, since you are working on this anyway, could you update it?

Add test case for seqan3::test::tmp_file_name

  • Test filename = nullptr
  • Test if temporary file will be deleted again
  • Test creating multiple creations of temporary files and if they are deleted again
  • Should not be copy constructable

deep views

In SeqAn we will need/want recursive views quite often, or more precisely, views that modify the innermost most range, not the outer range.
This is because we often operate on collections of string/vectors, e.g. we could have std::vector<dna4_vector> and want to do view::complement. Of course a vector-of-vector has no complement so we want the view to do this for the inner vectors. In other cases, e.g. reverse it is not so clear, because you could reverse either on the other range, or on the inner range.

I would propose

  • all views where it is clear, e.g. views that operate on alphabets, should automatically "recurse"; this will be documented!
  • all other views where we need this often enough, get a foo_r sibling, that has the behaviour, e.g. view::reverse_r

Alternatively, we would never do it automatically, and create _r version for all our views.

@seqan/all what do you think?

documentation of views

  • view properties needs to be improved so that people understand the consequences
  • most current views don't have \ingroup view set, yet
  • remove Complexity
  • remove Thread safety and Exception?

create seqan3/core/filesystem.hpp

with roughly this content which is currently in test/include/seqan3/test/tmp_file_name.hpp

#if __has_include(<filesystem>)
#include <filesystem>
namespace seqan3::filesystem = std::filesystem;
#else
#include <experimental/filesystem>
namespace seqan3::filesystem = std::experimental::filesystem;
#endif // __has_include(experimental/filesystem)

Thereby we can just include <seqan3/core/filesystem.hpp> instead of doing a check for the header and also we can use our sub-namespace instead of doing a check for the namespace.

Also add a check to platform.hpp that either or <experimental/filesystem> need to exist.

Export code snippets into files that can be tested

As done in seqan2, maybe we should not hardcode cpp example code in the documentation but instead include separate files. Those demo files might then be tested by the testing framework and this ensures that the cpp examples always work.

@xenigmax Does Doxygen support something like \include mycode.cpp"

@seqan/all

Write wiki entry for HowTo write Doxygen

Some of the fixes of the documentation seem not trivial.
It would be good to add a good practice documentation on the developer wiki, so that others do not run into the same issues as a follow up for #127.

Known issues

  • issue with \relates statement
  • issue unresolvable references, e.g. typename seqan3::underlying_char<alphabet_type>::type
  • \defgroup gap ➡️ \defgroup gap Gap
  • operator "" _aa27s ➡️ operator""_aa27s

union_composition clean-up: documentation

This is a detail/discussion ticket of the meta ticket #60

  • add \param documentation to all functions
  • don't use namespace seqan3::detail::union_alphabet since it doesn't appear in doxygen
  • don't use \privatesection and \publicsection outside of class scope (it doesn't have any effect there)
  • add \ingroup composition where needed!

Support concept for doxygen.

Hello.

There are 4 things regarding Concept according to the issue #72.

  1. Use cond syntax to let Doxygen skip parsing Concept part. (since it's not able to parse them)
    https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdcond

  2. Use interface command to manualy link Concepts to the document (it will be shown as C++ interface)
    http://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdinterface

  3. Use implement and extend commands to manually show inheritance information of Concepts.
    https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdimplements
    https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdextends

  4. More explanation regarding functions defined in the concept block. (They will not be shown by 1 )

You can check datailed explaination in #72.

I tried to do that by myself as requested but found that it's not only a burden for me but also very risky.
Unfortunately, I was not involving in many design discussions and not easy to figure out all the inheritance relationships between concepts.

Instead, I listed up all the source codes contain 'concept' by grep, and made a list with author names. Please understand that I'm not able to look at all these codes and figure out how to link them correctly.
Perhaps, it's not a big deal for original authors (but not me..). It does not require any hacks and infrastructure-related issues. Please just change some comments that you wrote and claimed as authors when you have a time.

Since the list is made by the grep, there can be some files that does not require any modification. If it's the case, please just check the checkbox. And please let me know regarding incorrect author assignments.

Sara @sarahet

  • alphabet/nucleotide/nucl16.hpp
  • alphabet/aminoacid/aa27.hpp

David & Hannes @eldariont

Marcel & David @marehr

  • alphabet/gap/gap.hpp
  • alphabet/gap/gapped_alphabet.hpp
  • alphabet/composition/union_composition.hpp
  • alphabet/composition/cartesian_composition.hpp

Rene @rrahn

  • core/concept/core_detail.hpp
  • core/concept/iterator_detail.hpp
  • core/concept/iterator.hpp
  • core/concept/core.hpp
  • core/platform.hpp
  • core.hpp

Joerg @joergi-w

  • io/sequence/sequence_file_format.hpp

Hannes @h-2

  • alphabet/nucleotide/dna5.hpp
  • alphabet/nucleotide/concept.hpp
  • alphabet/nucleotide/dna4.hpp
  • alphabet/nucleotide/rna5.hpp
  • alphabet/nucleotide/rna4.hpp
  • alphabet/detail/convert.hpp
  • alphabet/concept.hpp
  • alphabet/nucleotide.hpp
  • alphabet.hpp
  • container.hpp
  • core/pod_tuple.hpp
  • range/view.hpp
  • range/container.hpp
  • range/view/to_rank.hpp
  • range/view/char_to.hpp
  • range/view/concept.hpp
  • range/view/to_char.hpp
  • range/view/convert.hpp
  • range/view/rank_to.hpp
  • range/concept.hpp
  • range/container/concept.hpp
  • range.hpp

@mariehoffmann

  • alphabet/quality/composition.hpp
  • alphabet/quality/concept.hpp
  • alphabet/quality/aliases.hpp
  • alphabet/quality/illumina18.hpp
  • alphabet/quality.hpp

No author (probably not completed yet)

  • io/alignment/align_file_in.hpp
  • io/alignment/align_file.hpp
  • io/alignment/align_file_out.hpp
  • io/alignment/align_file_detail.hpp
  • io/sequence/sequence_file_in.hpp
  • io/sequence/sequence_file_format_fasta.hpp

<=> operator makes <, >, <=, >=, ==, != redundant

We can reduce a lot of code within the alphabet module. Because most alphabets are trivial comparable.


http://en.cppreference.com/w/cpp/language/default_comparisons:

Provides a way to request the compiler to generate consistent relational operators for a class.

In brief, a class that defines operator<=> automatically gets compiler-generated operators ==, !=, <, <=, >, and >=. A class can define operator<=> as defaulted, in which case the compiler will also generate the code for that operator.

class Point {
 int x;
 int y;
public:
 auto operator<=>(const Point&) const = default;
 // ... non-comparison functions ...
};
 
Point pt1, pt2;
if (pt1 == pt2) { /*...*/ } // ok
set<Point> s; // ok
s.insert(pt1); // ok
if (pt1 <= pt2) { /*...*/ } // ok, makes only a single call to <=>

add soft dependency on LEMON

  • create repository seqan/lemon
  • check in lemon subfolder (contains includes)
  • LICENSE file
  • AUTHORS file
  • write README file => document headers only, we changed config.h etc, please do not open issues
  • generate lemon/config.h
    • set all LEMON_HAVE_* to 0
    • make sure that the following variables are not set
      • LEMON_HAVE_LONG_LONG
      • LEMON_USE_PTHREAD
      • LEMON_USE_WIN32_THREADS
      • LEMON_CXX11
      • LEMON_WIN32
  • integrate submodule into seqan
  • adapt platform.hpp
    • detect lemon and perform version check
    • define LEMON_* that have not been defined in config.h
  • adapt build system seqan3-config.cmake
  • update README and documentation

Doxygen \file should be empty if it is in the same file

Currently we use mostly the \file <filepath> syntax.

For example in file /include/seqan3/alphabet/nucleotide/rna4.hpp

/*!\file alphabet/nucleotide/rna4.hpp
 * [...]
 */

which is basically the absolute path without the /include/seqan3/ part.

In /include/seqan3/alphabet/range.hpp and /include/seqan3/core/platform.hpp we only use the filename itself.

/*!\file platform.hpp
 * [...]
 */

I propose to use the \file syntax (without <filepath>), because according to https://www.stack.nl/~dimitri/doxygen/manual/commands.html#cmdfile:

If the file name is omitted (i.e. the line after \file is left blank) then the documentation block that contains the \file command will belong to the file it is located in.

Manual testing showed that it will produce the same doc as by providing a <filepath>.


A pro argument is:

  • You can't forget to rename the <filepath> part, after renaming the actual file

A contra argument:

  • If you see \file alphabet/nucleotide/rna4.hpp - even if you don't know what \file means - you can kind of guess that the following documentation is only for the file \file alphabet/nucleotide/rna4.hpp, whereas \file may not be descriptive enough

Discuss whether to give up the seqan3::literal namespace

We originally chose to move our literals into seqan3::literal because the STL does so also. For the STL the reasoning is that many people do using namespace std; and then the std literals might conflict with others.
For SeqAn this is not true from my POV. I don't think many people will do
using namespace seqan3; and also using namespace other_sequencelib; that by chance overloads the same literals as we do. In fact we lose nothing by just not supporting this setup.

And it would improve usability for our users, because they can call "ACGT"_dna4 without having to include an extra namespace.

don't force tests to be in release mode and activate warnings

  • remove line 7 from test/CMakeLists.txt so that tests can be built with DEBUG.
  • add -Wall -Wshorten-64-to-32 -Wall -Wconversion to the CMAKE_CXX_FLAGS
  • fix the warnings you then encounter :)

@cpockrandt since you have been absent from development for a while: this will give you a good overview over the current codebase 😉

Discuss whether to give up the <alphabet>_vector aliases

Should we remove the type alias dna4_vector, dna5_vector,...?

Observations:

  • We don't need those type aliases within the library code itself, because we will always assume containers of alphabet.
  • They are simply convenient for our users and for our example codes

Pro arguments for removing:

  • They are not much shorter than std::vector<dna4>
  • They don't obscure the true type behind the <alphabet>_vector alias (easier for beginners to understand)
  • No (future) developer will assume that the backend of the container is always vector

Pro argument for keeping:

  • It is convenient
  • It enforces a certain structure on our users (they will adopt to use dna4_vector from our examples)
  • ...

@rrahn commented on #146 (review):

The main obstacle IMPOV is that we should drop the dna|rna alias and maybe also the dna\d+_vector as there seems to be no benefit over std::vector<dna\d+>, does it? But we can discuss this offline.

@h-2 said on #146 (comment):

Hm, I would prefer not to. It's really long. Maybe even something shorter, like dna4vec? But that would go against our principle of not abbreviating... 🤔
I really want something short, though, because it is written often.

union_composition clean-up: tests

This is a detail/discussion ticket of the meta ticket #60

  • remove redundancy, add to generic test cases
  • don't test _value
  • tests may only depend on previously tested behaviour, there must be no circular dependencies
  • don't use static_asserts, instead use TESTs

Question: For the Dna alphabet, is ACGT stored in 4 bytes or one?

I am a newbie.

Is ACGT stored in 4 bytes or one?

If I have an alphabet of size 2^k, I want to store them is k bits. In particular, I want to be able to pack them into as few bytes as possible. I would also like k to be specified at run time.

Where do I start looking at the code if the answer to the above question is yes.

Mark Stankus

Doxygen warnings

We really need to get rid of the warnings so that we a have clean state once and for all.

This is the list of files and responsible people:

@h-2:

  • seqan3/include/seqan3/alphabet/adaptation/all.hpp
  • seqan3/include/seqan3/alphabet/concept_pre.hpp
  • seqan3/include/seqan3/alphabet/nucleotide/all.hpp
  • seqan3/include/seqan3/alphabet/nucleotide/dna4.hpp
  • seqan3/include/seqan3/alphabet/nucleotide/dna5.hpp
  • seqan3/include/seqan3/alphabet/nucleotide/nucl16.hpp
  • `seqan3/include/seqan3/alphabet/nucleotide/rna4.hpp``
  • seqan3/include/seqan3/alphabet/nucleotide/rna5.hpp
  • seqan3/include/seqan3/alphabet/all.hpp

@rrahn:

  • seqan3/include/seqan3/core/concept/core_detail.hpp
  • seqan3/include/seqan3/core/concept/iterator_detail.hpp

@sarahet:

  • seqan3/include/seqan3/alphabet/aminoacid/aa27.hpp
  • seqan3/include/seqan3/alphabet/aminoacid/all.hpp

@marehr:

  • seqan3/include/seqan3/alphabet/composition/all.hpp
  • seqan3/include/seqan3/alphabet/gap/all.hpp
  • seqan3/include/seqan3/alphabet/gap/gap.hpp

@mariehoffmann:

  • seqan3/include/seqan3/alphabet/quality/concept.hpp
  • seqan3/include/seqan3/alphabet/quality/illumina18.hpp
  • seqan3/include/seqan3/container.hpp

These files should be excluded from doxygen for now:

  • seqan3/include/seqan3/io/alignment/align_file_detail.hpp
  • seqan3/include/seqan3/io/alignment/align_file_in.hpp
  • seqan3/include/seqan3/io/alignment/align_file.hpp
  • seqan3/include/seqan3/io/sequence/sequence_file_format.hpp
  • seqan3/include/seqan3/io/sequence/sequence_file_in.hpp

Add proper file I/O Exceptions

We to port some file i/o exceptions.

  • FileOpenError - error indicating that a file could not opened.
  • ParseError - error indicating that parsing the file failed.

structure alphabets

I think we should add the structure formats that are alphabets also to the alphabet module.

The following would make sense from my POV:

alphabet/structure/all.hpp
alphabet/structure/concept.hpp
alphabet/structure/dot_bracket3.hpp    // rna: . ( )
alphabet/structure/wuss18.hpp    // rna: .,;<>(){}[]AaBb.-_
alphabet/structure/dssp9.hpp   // protein: HGIEBTSCX

Wuss could probably be bigger as well since all character pairs are allowed, I think...

@joergi-w What do you think? Can you work on this?

Make all seqan3 objects "streamable".

I think alphabet strings like dna4_string should be streamable, since there are several use cases.

Next to the obvious std::cout << my_dna_string, I might also want to stream into a dna4_string via an std::istringstream. In the case (more unlikely, I know) that a dna4_string is a user defined command line input (maybe "give a promotor sequence"), this is currently not possible because I require option input types to be std::istringstream convertable.

@h-2, @joergi-w, @marehr what do you think? (as git blames you as authors of the alphabet module)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.