sethjuarez / numl Goto Github PK
View Code? Open in Web Editor NEWMachine Learning for .NET
Home Page: http://numl.net
License: MIT License
Machine Learning for .NET
Home Page: http://numl.net
License: MIT License
the col value returns a lot of times -1
// uh oh, need to return something?
// a weird node of some sort...
// but just in case...
if (col == -1)
BuildLeafNode(y.Mode());
data feed function :
public static IEnumerable<Value> GetData()
{
Random r = new Random (500) ;
string rs = "";
for (int i =0; i < 2000; i++)
{
var a = r.Next(1, 500);
var sum = i;
if (sum <= 100)
rs = "s";
else if (sum > 100 && sum <= 250)
rs = "m";
else
rs = "l";
yield return new Value { V1 = 2, V2 = i , R = rs };
}
}
Hi Seth,
I am trying to set one integer as a feature & one datetime as the label for LinearRegression generator
I get error as "Dimensions do not match" for Learner.Learn method.
The training data passed is like
{"Timestamp": "2016-06-06T15:49:46.6420000Z", "num1":32}
Below is the stacktrace -
at numl.Math.LinearAlgebra.Vector.Dot(Vector one, Vector two)
at numl.Supervised.Regression.LinearRegressionModel.Predict(Vector x)
at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable1 examples, Double trainingPct, Int32 total) at numl.Learner.Learn(IEnumerable
1 examples, Double trainingPercentage, Int32 repeat, IGenerator generator)
at MyLib.Repo.GenerateMyModel(List`1 trainingData)
Can you please help?
Hi Seth,
I'm getting an exception when trying to deserialize from a stream. Please check.
The error and the code are attached below
ERROR:
Unhandled Exception: System.InvalidOperationException: There is an error in XML
document (0, 0). ---> System.Xml.XmlException: Root element is missing.
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
at System.Xml.XmlTextReaderImpl.Read()
at System.Xml.XmlTextReader.Read()
at System.Xml.XmlReader.MoveToContent()
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderDecisi
onTreeModel.Read1_DecisionTreeModel()
--- End of inner exception stack trace ---
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, St
ring encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
at numl.Supervised.Model.Load(Stream stream) in c:\projects\numl\numl\Supervi
sed\Model.cs:line 82
at numl.Supervised.DecisionTree.DecisionTreeModel.Load(Stream stream) in c:\p
rojects\numl\numl\Supervised\DecisionTree\DecisionTreeModel.cs:line 72
class Program
{
static void Main(string[] args)
{
var data = new List<MyData>();
for (var i = 0; i < 1000; i++)
{
data.Add(new MyData { Prop1 = i, Prop2 = i + 1, Prop3 = i + 2, Result = i % 2 == 0 });
}
var d = Descriptor.Create<MyData>();
var g = new DecisionTreeGenerator(d);
g.SetHint(false);
var learningModel = Learner.Learn(data, 0.80, 1000, g);
Console.WriteLine(learningModel);
string modelXml;
using (var ms = new MemoryStream())
{
learningModel.Model.Save(ms);
ms.Position = 0;
using (var reader = new StreamReader(ms, Encoding.Unicode))
{
modelXml = reader.ReadToEnd();
}
}
// now read
var m = new DecisionTreeModel();
using (var stream = new MemoryStream())
{
using (var sr = new StreamWriter(stream, Encoding.Unicode))
{
sr.Write(modelXml);
stream.Position = 0;
// THIS GIVES AN ERROR
m = (DecisionTreeModel)m.Load(stream);
}
}
}
}
public class MyData
{
[Feature]
public double Prop1 { get; set; }
[Feature]
public double Prop2 { get; set; }
[Feature]
public double Prop3 { get; set; }
[Label]
public bool Result { get; set; }
}
I've found what appears to be a bug in LinearRegressionModel.Load/Save methods.
I get this error when loading a previously saved model.
Attached model and code to reproduce it - just run the code twice to repro the problem.
Unhandled Exception: System.InvalidOperationException: There is an error in XML document (18, 6). ---> System.InvalidOperationException: There is an error in XM L document (18, 6). ---> System.InvalidOperationException: was not expected. at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderVector .Read1_v() --- End of inner exception stack trace --- at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, St ring encodingStyle, XmlDeserializationEvents events) at numl.Utils.Xml.Read[T](XmlReader reader) in c:\projects\numl\numl\Utils\Xm l.cs:line 152 at numl.Supervised.Regression.LinearRegressionModel.ReadXml(XmlReader reader) in c:\projects\numl\numl\Supervised\Regression\LinearRegressionModel.cs:line 76 at System.Xml.Serialization.XmlSerializationReader.ReadSerializable(IXmlSeria lizable serializable, Boolean wrappedAny) at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderLinear RegressionModel.Read1_LinearRegressionModel()
---CODE-----
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using numl;
using numl.Model;
using numl.Supervised;
using numl.Supervised.Regression;
namespace MachineLearningByFormulaDemo
{
class Program
{
static void Main(string[] args)
{
// generate data to train our neural network
var rnd = new Random();
Func<double, double, double> func = (l, r) => l + 2 * r;
IModel model;
var data = new List<ModelItem>();
const string modelFileName = "test.mdl";
if (File.Exists(modelFileName))
{
model = new LinearRegressionModel().Load(modelFileName);
}
else
{
for (var i = 0; i < 100; i++)
{
var left = rnd.NextDouble(0, 50000);
var right = rnd.NextDouble(0, 50000);
var result = func(left, right);
data.Add(new ModelItem { LeftOperand = left, RightOperand = right, Result = result });
}
var d = Descriptor.Create<ModelItem>();
var g = new LinearRegressionGenerator { Descriptor = d };
var learningModel = Learner.Learn(data, .80, 1000, g);
model = learningModel.Model;
model.Save(modelFileName);
}
// test our trained network with some sample data
for (var i = 0; i < 10; i++)
{
var left = rnd.NextDouble(0, 50000);
var right = rnd.NextDouble(0, 50000);
var item = new ModelItem { LeftOperand = left, RightOperand = right /* Result will be predicted */};
model.Predict(item);
var predictedResult = item.Result;
var expectedResult = func(left, right);
var diff = Math.Abs(expectedResult - predictedResult) / expectedResult;
Console.WriteLine("Expected: {0:N2}, Predicted: {1:N2}, Difference: {2:P}", expectedResult, item.Result, diff);
}
}
}
public class ModelItem
{
[Feature]
public double LeftOperand { get; set; }
[Feature]
public double RightOperand { get; set; }
[Label]
public double Result { get; set; }
}
public static class RandomExtensions
{
public static double NextDouble(
this Random random,
double minValue,
double maxValue)
{
return random.NextDouble() * (maxValue - minValue) + minValue;
}
}
}
What is the difference with [Label] and [StringLabel] attributes?
Hi Seth,
I get this error at random times with the same data. It comes either at the learning step in below code -
generator = new DecisionTreeGenerator(5) { Descriptor = descriptor, Hint = 0 };
learningModel = Learner.Learn(data, 0.8, 5, generator);
Or
While deserilizing the model json like
JsonReader.ReadJson(jsonData);
This error really comes randomly & goes away with the same data after a few retries.
Can you please help spotting it?
After using
void Save(string file);
to save the trained model, how to load the model again?
Add ability to save and load models
I've been playing with K-Means the last few days and found some clusters with some data I'm interested in, which is great! I'd love to be able to rank the members of the clusters I am interested in by distance from the cluster center.
It doesn't look like the data available in the Cluster type offers the ability to do that. Is there something I'm missing? I see that there is a vector for the center of the cluster. I think the piece that I am missing are vectors for each member. I think if I had that, I could then do something like a Euclidian distance calculation to determine the ranking that I want. Does this sound like the right path?
Any thoughts are certainly appreciated!
First, hats off for this great library, its so much better designed than Encog or Accord.Net ... and DotnetCore support jay !!
On to my question:
You recently added GRU support to the library, and i was wondering how to use it to do time series prediction ( I now the theory behind it etc, i am just wondering about how to use descriptors and generally the library to achieve this. )
Lets assume we have these classes ( i am using a simple, artifical example here, i know that this is not a good model, but it shows the principle )
class SensorReading
{
DateTime TimeStamp;
double Temperature
}
Let assume we have those readings in 1 minute intervals.
So we get lets say a List[SensorReading] with 1.000.000 entries.
Now usually i would apply a sliding window to this input so i get the following timeseries entities out of it: (
class SensorTimeSeries
{
IList[SensorReading] Input;
SensorReading Output;
}
So i get a List[SensorTimeSeries] out of the List[SensorReading] with lets say 5 Input data points and the corresponding output ( the 6th reading ) to predict and a total of 1.000.000 - 6 entries.
My question is now, how do i create a GRU model for this ? How to train and evaluate this lateron ?
To reiterate my training data would be List[SensorTimeSeries] with 999.994 entries each containing 5 readings as input and one as the to predicted label...
When you have a list of training data and a particular feature property's range of values is 0-80, then you attempt to predict on an instance where that property value is set to 100, the DecisionTreeGenerator model prediction throws an exception. I'm not sure if this is by design but I thought that just because the training data hasn't seen the value used in the prediction, it would at least know that it is greater than all values that it has been trained by.
You can find the offending test in my pull request. The Test that illustrates the issue is called:
ArbitraryPrediction_Test_With_Feature_Value_Greater_Than_Trained_Instances
Seth,
Could you please fix the Load functionality for KernelPerceptron? Thanks.
This code fails:
var data = new List<MyData>();
for (var i = 0; i < 100; i++)
{
data.Add(new MyData { Prop1 = i, Prop2 = i + 1, Prop3 = i + 2, Result = i % 2 == 0 });
}
var descriptor = Descriptor.Create<MyData>();
var kernel = new RBFKernel(3);
var generator = new KernelPerceptronGenerator(kernel) { Descriptor = descriptor };
var learningModel = Learner.Learn(data, 0.80, 1000, generator);
Console.WriteLine(learningModel);
learningModel.Model.Save("model.mdl");
// THIS FAILS ----
learningModel.Model.Load("model.mdl");
Error:
There is an error in XML document (2, 2). ---> System.MissingMethodException: No parameterless constructor defined for this object.
at System.RuntimeTypeHandle.CreateInstance(RuntimeType type, Boolean publicOnly, Boolean noCheck, Boolean& canBeCached, RuntimeMethodHandleInternal& ctor, Boolean& bNeedSecurityCheck)
at System.RuntimeType.CreateInstanceSlow(Boolean publicOnly, Boolean skipCheckThis, Boolean fillCache, StackCrawlMark& stackMark)
at System.Activator.CreateInstance(Type type, Boolean nonPublic)
at System.Activator.CreateInstance(Type type)
at numl.Supervised.Perceptron.KernelPerceptronModel.ReadXml(XmlReader reader) in c:\projects\numl\numl\Supervised\Perceptron\KernelPerceptronModel.cs:line 63
at System.Xml.Serialization.XmlSerializationReader.ReadSerializable(IXmlSerializable serializable, Boolean wrappedAny)
at Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderKernelPerceptronModel.Read1_KernelPerceptronModel()
--- End of inner exception stack trace ---
at System.Xml.Serialization.XmlSerializer.Deserialize(XmlReader xmlReader, String encodingStyle, XmlDeserializationEvents events)
at System.Xml.Serialization.XmlSerializer.Deserialize(Stream stream)
at numl.Utils.Xml.Load(String file, Type t) in c:\projects\numl\numl\Utils\Xml.cs:line 94
at numl.Supervised.Model.Load(String file) in c:\projects\numl\numl\Supervised\Model.cs:line 75
I'd love to use this library as part of a .NET Standard project, as well as .NET Core.
Thank you.
I'm trying to do a really basic linear regression (Z = 2 * X + 1) prediction using NuML. Given the data is so linear I can't understand why the predicted value is so far off unless I am doing something wrong. I have the target class
public class Sample
{
public float V { get; set; }
public float X { get; set; }
public float Y { get; set; }
public float Z { get; set; }
public Func<float, float, float, float> OutputStrategy { get; set; }
public Sample(Func<float, float, float, float> outputStrategy)
{
OutputStrategy = outputStrategy;
}
public void Seed(int i)
{
V = (float) i;
X = (float) 2 * i;
Y = (float) 3 * i;
Z = OutputStrategy(V, X, Y);
}
}
and I have the NuML code to set up the source values and predict an answer for an arbitrary new data point:
NB: The output strategy is a simple 2 * A + 1. I've tried it with multivariate analysis and the prediction is further away
public static void Main(string[] args)
{
// Generate sample data
int sampleSize = 1000;
Sample[] samples = new Sample[sampleSize];
Func<float, float, float, float> outputStrategy = (A, B, C) => 2 * A + 1;
for (int i = 0; i < sampleSize; i++)
{
samples[i] = new Sample(outputStrategy);
samples[i].Seed(i);
}
// calculate model
var generator = new LinearRegressionGenerator();
var descriptor = Descriptor.New("Samples")
.With("V").As(typeof(float))
.With("X").As(typeof(float))
.With("Y").As(typeof(float))
.Learn("Z").As(typeof(float));
generator.Descriptor = descriptor;
var model = Learner.Learn(samples, 0.6, 50, generator);
// Use prediction
var targetSample = new Sample(outputStrategy);
targetSample.Seed(sampleSize + 1);
var predictedSample = model.Model.Predict(targetSample);
var predictedValue = predictedSample.Z;
var actualValue = outputStrategy(targetSample.V, targetSample.X, targetSample.Y);
Console.Write("Predicted Value = {0}, Actual Value = {1}, Difference = {2} {3:0.00}%", predictedValue, actualValue, actualValue - predictedValue, (decimal) (actualValue - predictedValue) / (decimal) predictedValue * 100M);
Console.ReadKey();
}
This gives a difference of about 0.5% which considering the line is completely straight was surprising. I have tried using different % of the dataset for training and number of iterations of the model but it makes no difference to the output.
If I use even a more slightly more complicated model I get much worse predictive capabilities. If I use logistic regression, the predicted output of Z is always 1?!
Do you have any documentation, other than the API References?
I like you framework but, I must admit, it is hard to use it without any documentation.
Thanks in advance!
I am using version 0.9.14-beta of nuget pkg for aspnetcore, and trying to load saved json of model. For DecisionTreeModel, the LoadJson method is not implemented.
Any help?
Not a big deal, but wondering if this is indicative of some feature that was not implemented
Learner.cs line 92 var total is not used.
Create standardization around unsupervised models.
Hi Seth,
I've run numl in the profiler that comes with Visual Studio and found this line of code to cause a lot of CPU utilization:
DecisionTreeGenerator.cs, line 212, inside GetBestSplit()
Activator.CreateInstance(ImpurityType);
It's essentially a call to a costly Activator.CreateInstance from within a "for" loop. This could potentially be optimized by taking it out of the loop, or even calling it once and caching, if ImpurityType property is not meant to be assigned.
I'm not seeing any calls to the setter outside the DecisionTreeGenerator class, so it could potentially be made private, e.g.
public Type ImpurityType { get; private set; }
Thanks.
Use label detection to select appropriate models
I needed a guid property, and I implemented it for my solution. But I was thinking that maybe someone else might be interested in that. Should I prepare a PR for that? Or do you want to keep the core stuff as slim as possible?
Hi
when I serialize a KNNModel I get an exception on K.ToString("r").
If I change to K.ToString("d") it seems to work fine.
Could you help me, please?
'public override void WriteXml(XmlWriter writer)
{
writer.WriteAttributeString("K", K.ToString("d"));
Xml.Write(writer, Descriptor);
Xml.Write(writer, X);
Xml.Write(writer, Y);
}`
I am thinking of moving all serialization to json. Any thoughts?
I tried to load a saved model but it gave System.TypeLoadException-Cannot find type numl.Tests.Data.Iris
on JsonReader.ReadJson
var data = Iris.Load();
var description = Descriptor.Create();
var generator = new DecisionTreeGenerator(50);
var model = generator.Generate(description, data) as DecisionTreeModel;
model.Save(AppDomain.CurrentDomain.BaseDirectory + "/model_cache/test.json");
var learntmodel = JsonReader.ReadJson(System.IO.File.ReadAllText(AppDomain.CurrentDomain.BaseDirectory + "/model_cache/test.json"));
Add the ability to solve any problem using genetic algorithms, should be extensible to allow defining of custom problems and solving heuristics. Properties would include population growth rate, cross over rate, elitists and other genetic metrics.
Use case; An ARIMA type regression algorithm using moving genetic solvers, allowing time series predictors to modulate over time.
[V2, 0.0016]
|- 0 ≤ x < 49.5
| [V1, 0.0000]
| |- 1 ≤ x < 1.01
| | +(L, 1)-----------------------------<<<< this supposed to be "s"
|- 49.5 ≤ x < 99.01
| +(L, 1)
[Test]
public void ValueObject_Test_With_Yield_Enumerator()
{
var data = ValueObject.GetData();
var generator = new DecisionTreeGenerator()
{
Descriptor = Descriptor.Create<ValueObject>()
};
var decisionTree = new DecisionTreeGenerator();
var model = generator.Generate(data);
var o = new ValueObject() { V1 = 1, V2 =10 };
var os = model.Predict<ValueObject>(o).R;
Assert.AreEqual("l".Sanitize(), os);
}
Doing anything with the library that uses StringFeature causes the application to just sit idly recursively allocating memory until it has consumed all system memory (if you allow it to), regardless of the data set length, string size, trainer parameters, etc.
I've install numl from NuGet and there is no Reccomendation class in it.
When using the description method in numl the Descriptor object throws an exception when a Feature or Label attribute is defined on a complex type property. Properties can be complex types from external libraries thus preventing numl attribute usage.
For example given the below type:
public class Foo
{
[Feature]
public Bar One { get; set; }
[Label]
public bool IsOK { get; set; }
}
public class Bar
{
public int A { get; set; }
public int B { get; set; }
public int C { get; set; }
}
The descriptor would throw an error on converting the type Bar to a double. This is the same for nullable type properties also.
A suggested implementation is to use property path(s) in the Feature attribute. This would allow the Descriptor to extract one or more sub properties from the complex type.
Suggested Implementation:
public class Foo
{
[FeatureSelector("Bar.A", Bar.B", "Bar.C")]
public Bar One { get; set; }
[Label]
public bool IsOK { get; set; }
}
The link to the documentation in the readme is broken and the main site only has a title on the documentation page.
I would love to read the documentation if there is some.
Hi!
I'm using the example provided in the "Getting Started" section, and I'm constantly getting NullReferenceException
when using the Learner
.
The exception occurs on Ject.GetCtor
. If I substitute the Parallel.For
on the Learner
for a normal For
, the exceptions disappear.
System.NullReferenceException was unhandled by user code
HResult=-2147467261
Message=Object reference not set to an instance of an object.
Source=mscorlib
StackTrace:
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at System.Collections.Generic.Dictionary`2.set_Item(TKey key, TValue value)
at numl.Utils.Ject.GetCtor(Type type) in c:\projects\numl\numl\Utils\Ject.cs:line 182
at numl.Utils.Ject.Create(Type type) in c:\projects\numl\numl\Utils\Ject.cs:line 195
at numl.Supervised.DecisionTree.DecisionTreeGenerator.GetBestSplit(Matrix x, Vector y, List`1 used) in c:\projects\numl\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 215
at numl.Supervised.DecisionTree.DecisionTreeGenerator.BuildTree(Matrix x, Vector y, Int32 depth, List`1 used) in c:\projects\numl\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 112
at numl.Supervised.DecisionTree.DecisionTreeGenerator.Generate(Matrix x, Vector y) in c:\projects\numl\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 92
at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable`1 examples, Double trainingPct, Int32 total) in c:\projects\numl\numl\Learner.cs:line 145
at numl.Learner.<>c__DisplayClasse.<Learn>b__d(Int32 i) in c:\projects\numl\numl\Learner.cs:line 111
at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
InnerException:
Hi, there is another issue and not sure if it is issue or I am doing some thing wrong :
// value should be "very weak" , but I am getting "very good"
namespace Console
{
public class Value
{
[Feature]
public double V1 { get; set; }
[Feature]
public double V2 { get; set; }
[Feature]
public double V3 { get; set; }
[Label]
public string R { get; set; }
public static IEnumerable<Value> GetData()
{
Random r = new Random();
for (int i = 0; i < 8000; i++)
{
string label = "s";
double v1 = r.Next(0, 100);
double q = r.Next ( 0,100) ;
double math = r.Next ( 0,100) ;
double avrge = ( q+i+math ) /3 ;
if ( avrge >80 )
label ="very good" ;
else if ( avrge > 60 )
label ="good" ;
else if ( avrge > 50 )
label= "middle" ;
else if ( avrge>30 )
label="weak" ;
else
label="very weak" ;
yield return new Value { V1 = v1, V2 = q, V3 = math, R = label };
}
}
}
class Program
{
static void Main(string[] args)
{
var data = Value.GetData().ToList() ;
var description = Descriptor.Create<Value>();
var generator = new DecisionTreeGenerator(512,50, description, null);
var model = generator.Generate(description, data);
Value currentValue = new Value { V1 = 1, V2 = 20, V3=30 };
var pay = model.Predict<Value>(currentValue).R;
}
}
}
It seems I keep running across the Levenshtein Edit Distance algorithm. In reviewing the code for this library I found other distance algorithms implemented, and wondered what might have caused this one to be left out. (See Issue #16)
Given the importance of the algorithm and the relevance to information processing and loosely to machine learning, there are a few challenges to be considered.
Given a class:
class SimilarWordEntry
{
[Feature] public string WordA { get; set; }
[Feature] public string WordB { get; set; }
// Where LevenshteinDistance is computed following the guidelines at: http://en.wikipedia.org/wiki/Levenshtein_distance
[Feature, LevenshteinDistance("WordA","WordB")]
public int EditDistance { get; set; }
[Feature] public int WordALength { get { return WordA.Length; } }
[Feature] public int WordBLength { get { return WordB.Length; } }
// Where Soundex is computed following the guidelines at: http://en.wikipedia.org/wiki/Soundex
[Feature, Soundex("WordA")] public string WordASoundex { get; set; }
[Feature, Soundex("WordB")] public string WordBSoundex { get; set; }
[Label] public double SimilarityScore { get; set; }
}
With this syntax we can convert two input features into a third feature extracted by use of the algorithm. The syntax is simply convenience over populating a Feature property with the result of the edit distance algorithm.
Thoughts?
I must be doing something horribly wrong. I'm trying to use naive bayes to categorize some data based on input via a spreadsheet. The code runs, but it seems to give me the same category regardless of my input. I've appended some sample code below (gloss over the helper call that reads Excel; that just takes the sheet & converts to a .net dataset). Also, I'm limiting the training data to the first 100 rows. Any more than that and the program gets an out of memory exception!
So I have two questions:
[Serializable]
public class PeerCategory
{
[Feature]
public string PeerCategoryDesc { get; set; }
[Label]
public string CAPeerCategory { get; set; }
}
[TestFixture]
//[Ignore]
public class BayesTest
{
[Test]
public void TestExcel()
{
string pathToFile = @"C:\data\Training Data.xlsx";
using (var reader = new FileReaderExcel(pathToFile))
{
int index = reader.GetSheetIndexByName("V6");
var dt = reader.GetDataTableFromExcelSheet(index, true);
var peerCatList = (
from DataRow row in dt.Rows
select new PeerCategory()
{
PeerCategoryDesc = row.Field<string>(0),
CAPeerCategory = row.Field<string>(1)
}).Take(100).ToList();
var width = peerCatList.Select(t => t.CAPeerCategory).Distinct().Count();
IGenerator generator = new NaiveBayesGenerator(width);
generator.Descriptor = Descriptor.Create<PeerCategory>();
LearningModel learned = Learner.Learn(peerCatList, 0.8, 1000, generator);
IModel model = learned.Model;
double accuracy = learned.Accuracy;
var value1 = model.Predict(GetItem("paint"));
var value2 = model.Predict(GetItem("flower"));
var value3 = model.Predict(GetItem("sofa"));
var value4 = model.Predict(GetItem("desk"));
var value5 = model.Predict(GetItem("bones"));
// value1 thru 5 will be the same category over and over again!!
}
}
public PeerCategory GetItem(string desc)
{
var item = new PeerCategory()
{
PeerCategoryDesc = desc,
CAPeerCategory = string.Empty
};
return item;
}
Hi Seth,
Had an idea to do a REALLY simple attempt to learn a function that I would have ordinarily implemented as a switch statement, just for the mind-bending. :-)
The code was written to be run in LinqPad.
void Main()
{
Assembly.GetAssembly(typeof(Learner)).Dump();
var gen = new numl.Supervised.NeuralNetwork.NeuralNetworkGenerator();
gen.Descriptor = Descriptor.Create<WindDirection>();
var learned = Learner.Learn(WindDirection.TrainingData(), 16/20, 1, gen);
var model = learned.Model;
var accuracy = learned.Accuracy.Dump();
var windDir = new WindDirection(350, null);
model.Predict(windDir); //Uncomment this if you are running this in LinqPad .Dump("Prediction");
}
// Define other methods and classes here
public class WindDirection {
[Feature]
public double Degrees { get; set; }
[StringLabel()]
public String Direction { get; set; }
public WindDirection(double degrees, string direction)
{
this.Degrees = degrees;
this.Direction = direction;
}
public static WindDirection[] TrainingData()
{
return new[] {
// Training Values
new WindDirection(0, "N" ),
new WindDirection(22.5, "NNE"),
new WindDirection(45, "NE" ),
new WindDirection(67.5, "ENE"),
new WindDirection(90, "E" ),
new WindDirection(112.5, "ESE"),
new WindDirection(135, "SE" ),
new WindDirection(157.5, "SSE"),
new WindDirection(180, "S" ),
new WindDirection(202.5, "SSW"),
new WindDirection(225, "SW" ),
new WindDirection(247.5, "WSW"),
new WindDirection(270, "W" ),
new WindDirection(292.5, "WNW"),
new WindDirection(315, "NW" ),
new WindDirection(337.5, "NNW"),
// Testing Values
new WindDirection(22.5, "NNE"),
new WindDirection(112.5, "ESE"),
new WindDirection(11.25, "N"),
new WindDirection(359-11.25, "N")
};
}
}
However, running the above Main function results in the following IndexOutOfRangeException.
at numl.Model.StringProperty.Convert(Double val) in c:\projects\numl\numl\Model\StringProperty.cs:line 109
at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable`1 examples, Double trainingPct) in c:\projects\numl\numl\Learner.cs:line 169
at numl.Learner.<>c__DisplayClasse.<Learn>b__d(Int32 i) in c:\projects\numl\numl\Learner.cs:line 110
at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.<ForWorker>b__c()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass11.<ExecuteSelfReplicating>b__10(Object param0)
I have looked at the relevant files here and I can't find the cause of the exception at the relevant lines.
I am using the NuGet 0.8.17.0 build when I am getting this exception.
Have I failed to follow the documentation correctly?
Thoughts?
I am totally new to machine learning. I am trying to figure out where to dive in.
My job is to be able to categorize images. Specifically patent labels. I will need to categorize common indicators on the label. (Though not my scenario, a decent example may be patent race: African-American, Caucasian, etc.)
But the image will also have barcodes and other numbers on them that are not the same from image to image (and should be ignored by the system).
To add one more level of complexity, there are many different kinds of patient labels. All of them will have the "race" info on them, but in different fonts and in different places. (And maybe even abbreviated differently.)
Is NuML able to do this kind of thing? If so I will dig in and learn it.
While taking a more thorough look into the Linear Regression implementation, I'm seeing that Accuracy tends to report as 0%. Here is the code that is currently being used in (dev branch) Learner.cs:
// testing
object[] test = GetTestExamples(testingSlice, examples);
double accuracy = 0;
for (int j = 0; j < test.Length; j++)
{
// items under test
object o = test[j];
// get truth
var truth = Ject.Get(o, descriptor.Label.Name);
// if truth is a string, sanitize
if (descriptor.Label.Type == typeof(string))
truth = StringHelpers.Sanitize(truth.ToString());
// make prediction
var features = descriptor.Convert(o, false).ToVector();
var p = model.Predict(features);
var pred = descriptor.Label.Convert(p);
// assess accuracy
if (truth.Equals(pred))
accuracy += 1;
}
// get percentage correct
accuracy /= test.Length;
Then this is consumed later in Learner.Best
:
var q = from m in models
where m.Accuracy == (models.Select(s => s.Accuracy).Max())
select m;
return q.FirstOrDefault();
So basically, it iterates through the training slice, makes the prediction, and then assesses the success of the prediction against the truth. But currently, it only has one implementation of assessment: truth.Equals(pred)
. This then is consumed in the Learner.Best()
being getting the one with the highest (max) value of Accuracy
.
This approach means that unless two doubles are exactly equal (not likely except for possibly trivial data) that LinearRegression will always produce 0% Accuracy.
I wanted to abstract this out, but I wanted to get thoughts on how to approach this, as there are a lot of possible routes forward.
We could...
TestOption
object/hierarchy and pass this in.
TruthEqualsPredictionTestOption
.Learner
singleton implementation from static class to a singleton instance, in which case we could subclass Learner
with overrides for different methods.I personally waver between the TestOption
approach and the Learner
changes. Each has its pros and cons.
With the TestOption
approach, we can easily keep from having breaking changes. But we would then have to change the Learner.Best()
method depending on what the options instance is, and we end up with a switch statement, or worse, an if-then-else chain.
With the Learner
singleton changes, we could more cleanly address the various capabilities of the Learner
class. But this would probably entail breaking changes. I could actually write an ILearnerThing
interface that has a default implementation that uses the current static class as-is, and this would avoid breaking changes. However, going forward, we would have a fragmented approach to using the library. Also, this would possibly (probably?) incorporate using DI of some sort which brings along with it more design decisions, i.e. complexity.
So, those are my thoughts. The goal is simply to get some accuracy with LinearRegression and do it in such a way that if we get a good statistician personage (or maybe one of you already is), it gives them easy access to a more robust assessment of accuracy without getting too YAGNI.
There is any strategy or way to generate a model out of a huge amount of data or updating with new data a prebuild model ?
Ps: thank you so much, i love this project !
I can create model successfully according to sample code. Now I should make a prediction with my unit data. How can I do this?
I try to download your project and open it in VS2015, I always get error, I can't open this project, do you have any guidance that can help me to open your projects? thanks.
Hi there,
My model consists of decimal properties labeled as [Feature] and [Label], and I'm getting the following exception in Jest.cs. It appears DoubleConverter.CanConvertTo(typeof(decimal)) returns false causing this exception.
As a workaround I've switched all properties to Double, but I'm wondering if anything can be done about it.
System.InvalidCastException was unhandled by user code
HResult=-2147467262
Message=Cannot convert 20 to Decimal
Source=numl
StackTrace:
at numl.Utils.Ject.Convert(Double val, Type t) in z:\Builds\work\6fc28cb662d1e0f0\numl\Utils\Ject.cs:line 287
at numl.Model.Property.Convert(Double val) in z:\Builds\work\6fc28cb662d1e0f0\numl\Model\Property.cs:line 79
at numl.Supervised.DecisionTree.DecisionTreeGenerator.BuildLeafNode(Double val) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 243
at numl.Supervised.DecisionTree.DecisionTreeGenerator.BuildTree(Matrix x, Vector y, Int32 depth, List1 used) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 172 at numl.Supervised.DecisionTree.DecisionTreeGenerator.Generate(Matrix x, Vector y) in z:\Builds\work\6fc28cb662d1e0f0\numl\Supervised\DecisionTree\DecisionTreeGenerator.cs:line 91 at numl.Learner.GenerateModel(IGenerator generator, Matrix x, Vector y, IEnumerable
1 examples, Double trainingPct) in z:\Builds\work\6fc28cb662d1e0f0\numl\Learner.cs:line 143
at numl.Learner.<>c__DisplayClasse.b__d(Int32 i) in z:\Builds\work\6fc28cb662d1e0f0\numl\Learner.cs:line 110
at System.Threading.Tasks.Parallel.<>c__DisplayClassf`1.b__c()
InnerException:
Hi folks,
Great work here!
I've recently run into a null-reference exception on this line due to not providing a proper Descriptor. I realize that is clear when reading the code, it just isn't all that clear from creating a model and then loading it from a serialized state and oh, you also need a descriptor to go with it.
Anyway, a pre-condition check on the Descriptor would, I think help clarify things for new folks.
Started working on this (with a ton of help from @bdschrisk)
Hi,
I created 2 models ( Decision Tree and Neural Network models) on the classification dataset that I have. The models were created fine but I face an error when doing the prediction with either of these models (using code Model.Predict<Class>Object
). The error was due to the conversion of double to int32 for features that have large numbers (10000 or more). I created a class for the data record objects and had specified double for all features. However, the numl code (Ject class) automatically converts double to Int32 for reasons that I could not understand. When looking at the Ject, Property & model classes, I could not comprehend where the type T is specified and why it is specified as Int32. Here are the error message and stack trace. Appreciate your help.
Value was either too large or too small for an Int32.
at System.Convert.ToInt32(Double value)
at System.Double.System.IConvertible.ToInt32(IFormatProvider provider)
at System.Convert.ChangeType(Object value, Type conversionType, IFormatProvider provider)
at System.Convert.ChangeType(Object value, Type conversionType)
at numl.Utils.Ject.Convert(Double val, Type t) in c:\projects\numl\numl\Utils\Ject.cs:line 313
at numl.Model.Property.Convert(Double val) in c:\projects\numl\numl\Model\Property.cs:line 79
at numl.Supervised.Model.Predict(Object o) in c:\projects\numl\numl\Supervised\Model.cs:line 38
at numl.Supervised.Model.Predict[T](T o) in c:\projects\numl\numl\Supervised\Model.cs:line 48
at WebSecurityRatingApp.Default.btnTwitter_Click(Object sender, EventArgs e)
Hi Seth,
Please add awaitable Learner.LearnAsync method that accepts CancellationToken as one of the parameters and support graceful cancellation. It doesn't need to be instantaneous.
Thank you.
I am doing a software to classify texts about health.
So, I have many texts classified in "positive" and "negative" and these are the texts for my training.
The user will put the new texts and my software will evaluate and say with this text is "positive" or "negative".
Today, I use Weka and Naïve Bayes to this, but I would like of a framework specific to .NET.
So, I have found numl but I have not found a sample about this.
Is it possible?
Thanks
This examples uses the Tennis.cs sample data from the numl.net website:
Tennis[] data = Tennis.GetData();
IGenerator generator = new NaiveBayesGenerator(2);
generator.Descriptor = Descriptor.Create<Tennis>();
LearningModel learned = Learner.Learn(data, 0.80, 1000, generator);
IModel model = learned.Model;
double accuracy = learned.Accuracy;
Tennis t = new Tennis
{
Outlook = Outlook.Sunny,
Temperature = Temperature.High,
Windy = false
};
Tennis predictedVal = model.Predict(t);
DisplayMessage(OutputTextbox, String.Format("Result: {0} (accuracy {1}%)", predictedVal.Play, accuracy * 100));
The result of this is that 3 predictions were true and 5 were false -- the correct response should be false. Here's the output:
7/29/2014 9:21:11 AM: Result: False (accuracy 100%)
7/29/2014 9:21:12 AM: Result: False (accuracy 100%)
7/29/2014 9:21:13 AM: Result: False (accuracy 100%)
7/29/2014 9:21:14 AM: Result: True (accuracy 100%)
7/29/2014 9:21:15 AM: Result: False (accuracy 100%)
7/29/2014 9:21:16 AM: Result: True (accuracy 100%)
7/29/2014 9:21:17 AM: Result: True (accuracy 100%)
7/29/2014 9:21:18 AM: Result: False (accuracy 100%)
All the other supervised generators produce the expected result. I've only been using this library for about an hour though, so hopefully I'm doing something wrong. Thanks!
Edit: correct response should be false, I accidentally wrote true.
Any label that implements the IEnumerable will throw exceptions saying that it needs to have the EnumerableFeature attribute, when I need a Label, not a Feature.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.