Research Interests
Direct Links
Relevant Links

 

FastSet

FastSet is a MatLab toolbox, designed to efficiently perform set operations substituting the standard MatLab's intersect, union, setdiff, setxor, ismember, hist and unique methods. They are at least 3 to 10 times faster and some consume significantly less memory than the original built-in methods. The gain increases significantly as the datasets grow larger.

Table of content

Toolbox Description

Motivation

MatLab does not naturally support sets and uses vectors to store them. Although, standard MatLab set routines return proper sets (vectors are sorted and each value does not appear more than once), they do not require their inputs to fulfil these properties. It is assumed that users can manipulate sets between recurrent calls to the set routines.

This safe approach is very unfortunate for those who work with large sets.

Typical users who perform operations with large sets might never alter these sets manually or can easily guarantee that they remain sorted and unique at all times. If this is the case, set operations could be optimized to take advantage of the definite order. That's what the FastSet library for MatLab is about.

In addition to this basic assumption (and unlike the standard MatLab routines), FastSet routines respect the input data type. Many set users would opt to have their data stored in vectors of int16 or uint32. Such vectors consume significantly less memory and can be processed faster (due to the fact that CPUs are much quicker with integer operations). Furthermore, FastSet naturally supports integer 64bit types without converting the data to double (as MatLab set routines do). Implicate conversion to double performed by Matlab lead to a nasty bug which yields wrong results when neighbouring 64bit values are mapped to the same floating point number.

Finally, the FastSet routines are aggressively parallelized (on Windows platform only) and take advantage of the modern multicore CPUs. In some cases parallelization results in a tenfold increase in speed.

Organization

The FastSet toolbox consists of seven mex (compiled C++) files which cover all basic set operations. These functions are nearly perfectly compatible with the standard MatLab's routines and the existing code can be upgraded quite easily.

The functions are:

  • fast_intersect_sorted

  • fast_union_sorted

  • fast_setdiff_sorted

  • fast_setxor_sorted

  • fast_ismember_sorted

  • fast_unique

  • fast_frequency

FastSet toolbox is implemented in C++ and compiled as mex-files . One disadvantage of mex-files is that they do not contain help recognizable by MatLab help system. Therefore, each file is accompanied by a corresponding m-file containing a function prototype and detailed help. MatLab help system can extract help from these m-files (when for example "help fast_union_sorted" is called from the command prompt), while MatLab engine knows to call the mex-file when it is actually executed.

Mex-files are executables called by MatLab engine and have to match the bitness of the MatLab version. We provide the code built for any 7.x version of MatLab for Windows. It supports both 32 and 64-bit versions through .mexw32 and .mexw64 mex-files built for each method. 

Eben du Toit ported and built the package for 64-bt version of MatLab for Mac. Many thanks, Eben!

The sources are compatible with GCC and the functions can be easily built on any platform. However, the parallelized sections depend on proprietary Microsoft libraries and are substituted by non-optimized code compatible with other compilers.

Install (MS Windows)

Installation of the toolbox couldn't be easier. Download the library, unpack the archive into a directory on your computer and add that directory to MatLab's path (File->Set Path... in MatLab menu). Don't forget to save the new path (otherwize it will revet to the original in the next MatLab session).

I have aggresively parallelized the code where possible using microsoft's PPL (parallel performance library). For these methods to run, you would need to have Visual Studion redistributables installed on your system. They could already be there (check if MSVCR110.DLL and MSVCP110.DLL are present in your Windows/System32 folder). If not, you'll need to install the appropriate package from Microsoft's web site.

Notations

 

Limitations

  • The major FastSet limitation arises from the assumption which happens to provide the greates efficiency gain. Most FastSet routines assume that their inputs are sorted and unique. They will fail if any of the inputs does not satisfy this assumption. FastSet was designed to be as efficient as possible, hence the validity of inputs is not even verified. If you are not sure whether your data is sorted and unique, pass them through the MatLab's standard unique function. 
    The good news is that it is usually not easy to violate this requirement. All set functions (whether they are standard MatLab's set routines of FastSet's ones) return valid sets. So, porting existing code is quite simple.
  • FastSet routines support any numeric format - 8,16, 32 and 64 -bit integers as well as 64 and 80-bit floating point numbers. However, at this point they do support string arrays (cell arrays of strings). Let me know if you really need them and I'll consider implementing this feature as well.
  • Another incompatibility to the standard MatLab routins is that FastSet functions do not accept the optional (and rarely used) 'row' parameter.

Download

I have compiled and tested FastSet on both 32 and 64-bit MS Windows versions of MatLab. The toolbox comes with a source code, so if any of you, guys cares to build it for other OS, please, send me the binaries and I'll put them here as well. Porting to other compilers should be straightforward and I'd be happy to provide whatever assistence you might need.

Date Version Platform Size Link
Feb 26, 2013 1.2

MS Windows

MatLab 32 and 64 bits

212 Kb FastSet.v1.2.Win.rar
Feb 26, 2013 1.2

IOS,

MatLab 64 bits

61 Kb FastSet.v1.2.Mac.zip
Feb 26, 2013 1.2

Windows Sources,

MS Visual Studio 2012 (as well as gcc)

390 Kb FastSet.Sources.V1.2.zip

 

Previous versions:

Date Version Platform Size Link
April 4, 2012 1.1

Windows, MatLab 2011a, 32 and 64 bits

125 Kb FastSet.v11.rar
April 4, 2012 1.1

Windows Sources, MS Visual Studio 2010

1.3Mb FastSet_sources.v11.rar

 

 


Footer