2.7 Subsets (.NET, C#, CSharp, VB, Visual Basic, F#)
In addition to accessors for individual elements, columns, or rows in a data frame (Section 2.6), class DataFrame provides a large number of indexers and member functions for accessing sub-frames containing any arbitrary subset of rows, columns, or both (Section 2.8).
Such indexers and methods accept the NMath types Slice and Range to indicate sets of row or column indices with constant spacing, as well as abstract values like Slice.All for indexing all elements.
In addition, NMath Stats introduces a new class called Subset. Like a Slice or Range, a Subset represents a collection of indices that can be used to view a subset of data from another data structure. Unlike a Slice or Range, however, a Subset need not be continuous, or even ordered. It is simply an arbitrary collection of indices.
This section describes the Subset class.
Subset instances can be constructed in a variety of ways. One constructor simply accepts an array of integers:
Code Example – C#
var sub = new Subset( new int[] { 5, 4, 0, 12 } );
Another constructor accepts an ICollection whose elements are all System.Int32.
A very useful constructor takes an array of boolean values and constructs a Subset containing the indices of all true elements in the array. This can used, for example, to create a subset from a DataFrame containing the indices of the rows or columns than meet a certain criteria.
Thus, this code creates a subset of row indices containing those rows where the value in column 2 is greater than the value in column 3:
Code Example – C#
var bArray = new bool[ df.Rows ]; for ( int i = 0; i < df.Rows; i++ ) { bArray[i] = ( df[2][i] > df[3][i] ); } var rowIndices = new Subset( bArray );
This Subset could be use to access the sub-frame containing only those rows that meet the criterion, as described in Section 2.8.
A Subset can also be constructed from an array of other subsets. The subsets are simply concatenated. To created a sorted Subset of the unique indices, you can call Unique() on the constructed Subset (see below).
Lastly, constructors are provided that construct subsets with continuous spacing, like slices and ranges. For instance, this code creates a subset starting at 2, with 5 total elements, and a stepsize of 1:
Code Example – C#
var sub = new Subset( 2, 5, 1 );
Class Subset provides the following read-only properties:
● First gets the first index in the subset.
● Length gets the total number of indices in the subset.
● Indices gets the underlying array of integers.
● Last gets the last index in the subset.
Class Subset provides an indexing operator for getting and setting element values. Thus, subset[i] returns the ith element of the underlying array of integers.
Code Example – C#
sub[ 3 ] = 4;
NOTE—Indexing starts at 0.
The Get( i ) method safely gets the index at a given position by looping around the end of the subset if i exceeds the length of the subset:
Code Example – C#
var sub = new Subset( new int[] { 3, 4, 5, 8, 9 } ); int index = sub.Get( 5 ) // index = 3
You can also create a Subset of a Subset using the indexing operator. For instance:
Code Example – C#
var sub1 = new Subset( new int[] { 1, 3, 4, 7, 9 } ); var sub2 = new Subset( new int[] { 0, 2, 4 } ); Subset sub3 = sub1[ sub2 ]; // sub3.Indices = 1, 4, 9
Operator == tests for equality of two subsets, and returns true if both subsets are the same length and all elements are equal; otherwise, false. Following the convention of the .NET Framework, if both objects are null, they test equal. Operator != returns the logical negation of ==. The Equals() member function also tests for equality.
Arithmetic Operations on Subsets
NMath Stats provides overloaded arithmetic operators for subsets with their conventional meanings for those .NET languages that support them, and equivalent named methods for those that do not. Table 3 lists the equivalent operators and methods.
Operator |
Equivalent Named Method |
+ |
Add() |
- |
Subtract() |
* |
Multiply() |
/ |
Divide() |
Unary - |
Negate() |
++ |
Increment() |
-- |
Decrement() |
& |
Intersection() |
| |
Union() |
The Append() method adds an index to the end of a subset:
Code Example – C#
sub.Append( 5 );
Remove() removes the first occurence of a given index from a subset. Reverse() reverses the indices of a subset. Unique() sorts the indices in a subset and removes any repetitions. Thus:
Code Example – C#
var sub = new Subset( new int[] { 0,5,3,2,7,5 } ); sub.Remove( 3 ); // sub.Indices = 0, 5, 2, 7, 5 sub.Reverse(); // sub.Indices = 5, 7, 2, 5, 0 sub.Unique(); // sub.Indices = 0, 2, 5, 7
Similarly, ToReverse() returns a new subset containing the indices of a subset in the reverse order; ToUnique() returns a new subset containing the sorted indices of a subset, with all repetitions removed.
The Repeat() method creates a new subset by repeating the source subset until a given length is reached. For instance:
Code Example – C#
var sub1 = new Subset( 3 ); // sub1.Indices = 0,1,2 Subset sub2 = sub1.Repeat( 11 ); // sub2.Indices = 0,1,2,0,1,2,0,1,2,0,1
The Split() method splits a source subset into an arbitrary array of subsets. The parameters are the number of subsets into which to split the source subset, and another subset the same length as the source subset, the ith element of which indicates into which bin to place the ith element of the source subset. For example:
Code Example – C#
var sub = new Subset( 10 ); // sub.Indices = 0,1,2,3,4,5,6,7,8,9 Subset bins = new Subset( new int[] { 3, 1, 0, 2, 2, 1, 1, 2, 3, 0 } ); Subset[] subsetArray = sub.Split( 4, bins ); // subsetArray[0] = 2,9 // subsetArray[1] = 1,5,6 // subsetArray[2] = 3,4,7 // subsetArray[3] = 0,8
Lastly, the ToString() returns a comma-delimited string list of the indices in a subset.
The static GetGroupings() methods on Subset create subsets from factors. One overload of this method accepts a single Factor and returns an array of subsets containing the indices for each level of the given factor. Another overload accepts two Factor objects and returns a two-dimensional jagged array of subsets containing the indices for each combination of levels in the two factors. See Section 2.10 for more information on factors and the GetGroupings() methods.
The static method Sample( n ) returns a random shuffle of 0..n-1. The returned Subset can be used to randomly reorder the rows in a data frame, as described in Section 2.8.