Deep feature synthesis

Deep Feature Synthesis is an algorithm developed by James Max Kanter and Kalyan Veeramachaneni in their paper "Deep Feature Synthesis: Towards Automating Data Science Endeavors" [1]

Definition

Quoting the above paper: "Deep Feature Synthesis is an algorithm that automatically generates features for relational datasets. In essence, the algorithm follows relationships in the data to a base field, and then sequentially applies mathematical functions along that path to create the final feature."

Practical Results

Kanter and Veeramachaneni implemented the Deep Feature Synthesis algorithm in their Data Science Machine and proceeded to enter the automated results in several competitions:

Their results competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615. In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.[2]

Characteristics

Little to no human intervention.

Results in hours not weeks.

Relies on SQL schema and normalized table relationships.

Applications

Quickly create feature sets of predictive value.

Critique

The process of feature synthesis from relational data is known as propositionalization, which is known at least from 1991.[3] The employed algorithm in Deep feature synthesis was for the first time described by Knobbe in 2001 [4] and is known as RollUp. RollUp was later on enhanced in PRORED.[5] A commercial version of RollUp is sold under the name Safarii.

See also

References

  1. Kanter, Max; Veeramachaneni, Kalyan. "Deep Feature Synthesis: Towards Automating Data Science Endeavors" (PDF).
  2. Hardesty, Larry. "System that replaces human intuition with algorithms outperforms human teams".
  3. (ed.), European Working Session on Learning, Porto, Portugal, March 6–8, 1991 ; Y. Kodratoff (1991). Machine learning--EWSL-91 : proceedings. Berlin: Springer-Verlag. ISBN 0-387-53816-X.
  4. Knobbe, Arno (2001). "Propositionalisation and Aggregates". Principles of Data Mining and Knowledge Discovery: 277–288. doi:10.1007/3-540-44794-6_23.
  5. Gjorgjioski, Valentin. "Stochastic propositionalization of relational data using aggregates" (PDF).

Further reading

Relational data mining

External links

This article is issued from Wikipedia - version of the 3/9/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.