Giter Site home page Giter Site logo

epplus.dataextractor's Introduction

Build status

Summary

This is a simple EPPlus extension that make easier to extract POCO from spreadsheets.

Installation

You can download the library from nuget with the following command:

Install-Package EPPlus.DataExtractor

If you don´t have the EPPlus as a dependency, nuget will install it for you.

Usage

Using the package is really simple. Let's work with a simple example first.

Consider you want to extract the following data from a spreadsheet:

First table

Just a define a class to hold the data.

public class SimpleRowData
{
    public string CarName { get; set; }

    public double Value { get; set; }

    public DateTime CreationDate { get; set; }
}

Considering that the data is on a worksheet named worksheet1, this is the code that can be used to retrieve an IEnumerable of SimpleRowData:

using (var package = new ExcelPackage("spreadsheet/file/location/file.xlsx"))
{
    var cars = package.Workbook.Worksheets["worksheet1"]
        .Extract<SimpleRowData>()
        // Here we can chain multiple definition for the columns
        .WithProperty(p => p.CarName, "B")
        .WithProperty(p => p.Value, "C")
        .WithProperty(p => p.CreationDate, "D")
        .GetData(4, 6) // To obtain the data we indicate the start and end of the rows.
        // In this case, the rows 4, 5 and 6 will be extracted.
        .ToList();
}

Yeah, it is that simple!

Callbacks

You can also use callbacks for filling properties values, where you can put custom code, validations and even abort the rest of the execution. You can specify a callback that is executed over the object ( setPropertyValueCallback ) value or over the casted TValue ( setPropertyCastedValueCallback ) type.

The first parameter is of type PropertyExtractionContext, that contains data about the cell address used to populate the property ( PropertyExtractionContext.CellAddress ) and a Abort() method that can be used to cancel the rest of the processing for the entire extraction. The rows extracted before the execution of the Abort will be returned by the GetData method, and an entity for the current row will also be returned with all the previous properties populated.

The following code is based on the previous example and uses the setPropertyCastedValueCallback to print a message based on the value of the cell:

using (var package = new ExcelPackage("spreadsheet/file/location/file.xlsx"))
{
    var cars = package.Workbook.Worksheets["worksheet1"]
        .Extract<SimpleRowData>()
        .WithProperty(p => p.CarName, "B")
        .WithProperty(p => p.Value, "C")
        .WithProperty(p => p.CreationDate, "D",
            setPropertyCastedValueCallback: (propContext, creationDate) =>
            {
                if(creationDate < new DateTime(2000, 1, 1)) {
                    Console.WriteLine("The car in row {0} is here for too long, no one will buy it", propContext.CellAddress.Row);
                }

                // We could also abort if the date time value is not set (i.e. is the default one):
                /*
                if(creationDate == default(DateTime))
                {
                    Console.WriteLine("Invalid value in cell {0}!", propContext.CellAddress.Address);
                    propContext.Abort();
                }
                */
            })
        .GetData(4, 6)
        .ToList();
}

Columns that should be rows

Sometimes the tables defined in spreadsheets does not have a friendly structure for a developer. Instead of creating multiple tables and foreign key relationships in excel it is simpler to put data that should go into different tables as columns in the existing table. It'll be clear with the following example:

Table 2

Imagine that you want to extract this data into a class structure where the columns that indicates months/year should be a collection inside the entity that will keep the row data. So we can have two classes defined like that:

public class RowDataWithColumnBeingRow
{
    public string Name { get; set; }

    public int Age { get; set; }

    public List<ColumnData> MoneyData { get; set; }
}

public class ColumnData
{
    public double ReceivedMoney { get; set; }

    public DateTime Date { get; set; }
}

You can use the following code to extract the spreadsheet data to these classes:

using (var package = new ExcelPackage("spreadsheet/file/location/file.xlsx"))
{
    var data = package.Workbook.Worksheets["worksheet1"]
        .Extract<RowDataWithColumnBeingRow>()
        .WithProperty(p => p.Name, "F")
        .WithProperty(p => p.Age, "G")

        // Here, the collection property is defined using the "WithCollectionProperty" method.
        // The following parameter is the expression indicating the property of "ColumnData"
        // that will be used to receive the header data followed by an integer indicating the row
        // that contains the header.
        // The last expression indicates the other "ColumnData" property, this one will receive
        // the row data. The two last strings are the start and end column from where
        // this data will be extracted.
        .WithCollectionProperty(p => p.MoneyData,
            item => item.Date, 1,
            item => item.ReceivedMoney, "H", "S")
        .GetData(2, 4)
        .ToList();
}

In this previous example the objective was to extract columns and their headers into a collection of custom types inside property inside the main row class type. If you have various columns and the just want the value of them (i.e. the header is meaningless) there is an easier way of doing it as you can see in the next example

Table 2

Consider the model for the rows as:

public class MultiLingualUserData
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
    public List<string> LanguagesSpoken { get; set; }
}

You could extract the columns of the language spoken to the property LanguagesSpoken without hassle with the following code:

using (var package = new ExcelPackage("spreadsheet/file/location/file.xlsx"))
{
    var data = package.Workbook.Worksheets["worksheet3"]
        .Extract<MultiLingualUserData>()
        .WithProperty(p => p.FirstName, "A")
        .WithProperty(p => p.LastName, "B")

        // Here, the collection property is defined an overload of the "WithCollectionProperty" method.
        // This overload simply expects a property of List, Collection or HashSet followed by
        // the letters that indicate the interval of the columns from where the data will be
        // extracted
        .WithCollectionProperty(p => p.LanguagesSpoken, "C", "E")
        .GetData(2, 4)
        .ToList();
}

The GetData method returns an IEnumerable, and this IEnumerable is not evaluated until you interate through it or execute something like ToList over it. So make sure you'll do one of those things before disposing the EPPlus ExcelPackage.

epplus.dataextractor's People

Contributors

ipvalverde avatar trbngr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

epplus.dataextractor's Issues

Anonymous types

Hi, good job with the DataExtractor!

But what about anonymous types? Can i simply extract all the data to List?

What happen if i haven't any POCO class but want the data well formatted from the worksheet?

Regards

add example for Extract convertDatafunc

excellent project
just a new example in case someone is looking for it:
rmWorkSheet.Extract<RawMaterial>() // Here we can chain multiple definition for the columns .WithProperty(p => p.Id, "A") .WithProperty(p => p.Name, "B", fnTrimString) .WithProperty(p => p.Code, "C", fnTrimString)

fnTrimString is converting, fot this example, input object to string
with
Func<object, string> fnTrimString = objStr => { if (objStr == null) return ""; return objStr.ToString().Trim(); };

Extract data with Collections explanation

This is an excellent project, thank you @ipvalverde for the time. I have a quick question
How do i extract this data into this ScoresPoco

public class ScoresPoco
{
    public int ApplicantID { get; set; }
    public List<ScoreData> Scores { get; set; }
}

public class ScoreData
{
    public int SubjectID { get; set; }
    public double Score { get; set; }
}

image

Subject ID is in Row 2 hidden for both C & D columns

I am extracting with this

var data = worksheet.Extract<ScoresPoco>()
.WithProperty(p => p.ApplicantID, "B")
.WithCollectionProperty(p => p.Scores,
                        item => item.SubjectID, 1,
                        item => item.Score, "C", "E")
.GetData(model.DataRowStart, row => worksheet.Cells[row, model.DataColStart].Value != null)
.ToList();

This only returns the ApplicantID and Scores correctly but the subjectID is always 0.
There can also be more than 2 or Subjects, but all the IDs will be in the Row 2.
When debugging, I can see all that data in the worksheet but a way to add it to the SubjectID is my question.

Strong Name Errorr

Moin,

Strong Name Error, can you help

System.IO.FileLoadException: "Die Datei oder Assembly "EPPlus.DataExtractor, Version=2.0.0.0, Culture=neutral, PublicKeyToken=null" oder eine Abhängigkeit davon wurde nicht gefunden. Eine Assembly mit starkem Namen ist erforderlich.

GetData RowTo

If we want to read all rows how do we know the RowTo value of a file before uploading it?

Guid as Property value

Can the POCO have a Guid or as data type?

I am getting an error because a Guid can be parse but not casted.

System.InvalidCastException : Invalid cast from 'System.String' to 'System.Guid'.
at System.Convert.DefaultToType(IConvertible value, Type targetType, IFormatProvider provider)
at OfficeOpenXml.Utils.ConvertUtil.GetTypedCellValue[T](Object value)
at EPPlus.DataExtractor.PropertyValueSetter2.SetPropertyValue(TModel dataInstance, ExcelRangeBase cell) at EPPlus.DataExtractor.DataExtractor1.d__8.MoveNext()

WithCollectionProperty Icollection

I use Entity framework virtual ICollection for proxy creation
how can I use WithCollectionProperty with ICollection please ?

There are constructors for HashSet, List and Collection

If I try to add a new constructor like

public ICollectionPropertyConfiguration<TRow> WithCollectionProperty<TCollectionItem>(
        Expression<Func<TRow, ICollection<TCollectionItem>>> propertyCollection,
        string startColumn, string endColumn) where TCollectionItem : class
    {
        var collectionConfiguration = new SimpleCollectionColumnDataExtractor<TRow, ICollection<TCollectionItem>, TCollectionItem>
            (propertyCollection, startColumn, endColumn);

        this.simpleCollectionColumnSetters.Add(collectionConfiguration);

        return this;
    }

but I get an error : ICollection must be a non abstract type with a public parameterless constructor

thank you

extended WithCollectionProperty

Hi ipalverde,
this is an excellent project !
This is not an issue, it's an improvment request

How can I achieve this ?

  1. WithCollectionProperty : I need to extract 2 columns , not just one : Id and Value
    in screenshot : Id is RMID and value is : RM poids
  2. I don't know by advance how many columns ID,value I can have , I would need to iterate through columns untill column name is empty
    example
    I've tried to customize your code but I can't get it to work:
    example2
    example3

.NET 4.0

Would it be possible to get a version built against .NET 4.0 in nuget?

convertDataFunc

is it possible to pass convertDataFunc cell instead of cell.Value?

Upgrade EPPlus NuGet dependency

I noticed that although the package's dependency description says it depends on EPPlus (>= 4.5.2.1), it actually depends on EPPlus (== 4.5.2.1) based on the code.

I am currently having problem with this since our projects are using EPPlus 4.5.3.2. which is the latest stable version.

I tried to download the solution and upgraded the package reference. All tests in EPPlus.DataExtractor.Tests work fine with EPPlus 4.5.3.2.

Is it possible to modify the project in order to allow all EPPlus packages with version equal to or greater than 4.5.2.1?

[Request] WithProperty Row Skipping, Casting Override, Error Handling, & Data sanitization

Row Skipping

In the setPropertyCallbackValue callback, I can currently call Abort to stop processing. I would like to be to call Skip or Ignore to simply omit the current row. The ability to call a SkipProperty function would also be extremely useful; this would continue processing the row, but not attempt to convert the current property.

In my current project, I am currently achieving Skip functionality using a helper method like this:

        public List<T> Read<T>(string filename, string wsName, Func<ExcelWorksheet, ICollectionPropertyConfiguration<T>> txObjs, int startRow, Func<T, bool> valid = null) where T: class, new()
        {
            using (var xl = new ExcelPackage(new FileInfo(filename)))
            {
                var ws = xl.Workbook.Worksheets[wsName];

                var retVal = new List<T>();
                if (ws == null)
                    return retVal;

                var tgt_cnt = ws.Dimension.Rows - startRow + 1; //number of rows to read before complete
                var read_cnt = 0; // tracks # of read rows
                while (read_cnt < tgt_cnt)
                {
                    var vals = txObjs(ws).GetData(startRow + read_cnt, ws.Dimension.Rows);
                    read_cnt += vals.Count();
                    retVal.AddRange(valid == null ? vals : vals.Where(valid));                    
                }

                return retVal;
            }
        }

The txObjs parameter is a configuration broken out like this:

        public static ICollectionPropertyConfiguration<Person> PersonCfg(ExcelWorksheet ws)
            => ws.Extract<Person>()
                 .WithProperty(p => p.PersonId,  "A")
                 .WithProperty(p => p.LastName,  "B")
                 .WithProperty(p => p.FirstName, "C");

Casting Override

In one of the WithProperty callbacks (whichever makes more sense internally), I'd like to be able to call an OverrideCast method that allows me to write code to convert from object to the property type. This is for situations where there may need to be special logic for a specific property (like dirty data that sometimes contains a "0" instead of an empty string). This is particularly important because overriding the cast can prevent an error when casting an invalid value, such as an error.

Error Handling

Excel has a number of built-in error types. In one of the WithProperty callbacks (or as a separate callback), I'd like to handle the case of having an error value (e.g. "#N/A"). This could also be handled manually via the Casting Override change, but having an explicit error handling mechanism would provide a much more focused event to handle.

Data Sanitization

After the data has been converted to the destination type, I would like the ability to "sanitize" the data (or, if you prefer, transform it). For example, I may wish to convert "123456" to "R-123456", or trim strings. This probably makes sense as its own callback.


I think implementing these features would make this extension incredibly powerful for data ingestion and allow for very succinct and expressive code.

Merged rows

Hello, amazing job with DataExtractor!

but, is some solution about merged rows?

some like copying the merged row value on every List who has that value? or some kind of collection way?

Extend termination criteria

Hi,

I really like the simple approach, but imho DataExtractor.GetData() termination is lacking: it can either terminate based on the row number or on the value of a single property.
I propose to extend the termination test like this:

`
///


/// Obtains the entities for the columns previously configured.
/// The indicates the initial row that will be read,
/// the data extraction will only occur while the predicate returns true.
/// It'll get executed receiving the row index as parameter before extracting the data of each row.
///

/// The initial row to start the data extraction.
/// The condition that is evaulated for each row. The last fetched record is
/// passed as the argument to the predicate. The condition is evaluated before the row under test is
/// returned. Row fetching stops when the test returns false.
/// Returns an with the data of the columns.
public IEnumerable GetDataUntil(int fromRow, Func<int, TRow, bool> whileFunc)
{
if (whileFunc is null)
throw new ArgumentNullException(nameof(whileFunc));

        int row = fromRow;
        while(true)
        {
            var dataInstance = new TRow();

            bool continueExecution = true;
            for (int index = 0; continueExecution && index < this.propertySetters.Count; index++)
                continueExecution = this.propertySetters[index].SetPropertyValue(dataInstance, row, this.worksheet.Cells);

            if (!continueExecution)
            {
                yield return dataInstance;
                break;
            }

            foreach (var collectionPropertySetter in this.collectionColumnSetters)
                collectionPropertySetter.SetPropertyValue(dataInstance, row, this.worksheet.Cells);

            foreach (var simpleCollectionColumnSetter in this.simpleCollectionColumnSetters)
                simpleCollectionColumnSetter.SetPropertyValue(dataInstance, row, this.worksheet.Cells);

            if(!whileFunc(row, dataInstance))
                break;

            yield return dataInstance;

            row++;
        }
    }

`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.