Data Science: Introduction to Orange Tool

Introduction

Orange is an open-source data visualization, machine learning and data mining toolkit. It features a visual programming front-end for explorative rapid qualitative data analysis and interactive data visualization. Orange is a component-based visual programming software package for data visualization, machine learning, data mining, and data analysis. Orange components are called widgets and they range from simple data visualization, subset selection, and pre-processing, to empirical evaluation of learning algorithms and predictive modeling. You can explore more about the Orange tool here.

Orange Widget

  • Data
  • Visualize
  • Model
  • Evaluate
  • Unsupervised and so on.

Widgets offer essential functionality, like:

  • Displaying data table and allowing to selection features
  • Data reading
  • Training predictors and comparison of learning algorithms
  • Data element visualization, etc.

There are 3 ways to add widget to the canvas:

  1. Double click on the widget.
  2. Drag the widget to the canvas.
  3. Right click on the canvas for the widget menu.
The toolbox contains all the widgets

In the canvas, double click on the File widget to open it. Then, you can load your own dataset or browse it from the custom documentation dataset. Here as usual I load iris.tab dataset in the File widget which comes with the orange tool.

File and Data Info

  1. Drag the File widget to the canvas.
  2. Drag the Data Info widget to the canvas.
  3. At the right side of the File widget, there is a semi-circular shape. Mouse down on it and drag it to the Data Info widget.
  4. Notice that there is a link between both widget with the word data on top.
Click on data info widget to see information of data set

File and Data Table

  1. Drag a Data Table widget to the canvas.
  2. Connect File widget to the Data Table widget.
  3. Double click on the Data Table widget to see the rows & columns of dataset.

Distribution

The Distribution widget displays the value distribution of discrete or continuous attributes. If the data contains a class variable, distributions may be conditioned on the class.

  1. Drag a Distribution widget to the canvas.
  2. Connect File widget to the Distribution widget.
  3. Double click on the Distribution widget to see the visualization.
  4. At the top left, select a different variable and check the distribution results

Scatter Plot

The Scatterplot widget provides a 2-dimensional scatterplot visualization for both continuous and discrete-valued attributes. The data is displayed as a collection of points, each having the value of the x-axis attribute determining the position on the horizontal axis and the value of the y-axis attribute determining the position on the vertical axis.

  1. Drag a Scatter Plot widget into the canvas.
  2. Connect the File widget to the Scatter Plot widget. The step is similar to how it was for the Distribution widget.
  3. Double click on the Scatter Plot widget to see the visualization.
  4. You can change the x-axis and y-axis based on the features available.

If you are unsure which features to select, click on the “Find Informative Projections” and you will see the following interface. Click on “Start” and select any item from the list. The scatter plot will modified based on your selection.

If you want to load external data you can select the URL option in the File widget, where one can paste the external dataset link to load the data.

Conclusion

We have explored orange tool in detail and visualized the dataset that we have loaded. We tried out Distribution, and Scatter Plot widgets using the Iris dataset.