top of page
A command-line tool for data-driven fuzzy modelling
If you don't like reading, a video tutorial is available at https://www.youtube.com/watch?v=ZA_NADMyMsM
HABFUZZ comes without a Graphical User Interface. Still, the user's only task is to appropriately prepare the input file to run the software. When you run HABFUZZ, either on Windows, Mac OS or Linux, you will see the classic MS-DOS command window, which welcomes you and prompts you to run the software. All calculations are performed by the software and two files are produced, the 'suitability.txt' containing the predicted habitat suitability for each microhabitat combination and the 'log.txt' containing useful parameters of the selected process (fuzzy logic or fuzzy rule-based Bayesian algortihms).
The processes of HABFUZZ
HABFUZZ is fully data-driven. It reads a training dataset and a 10-fold cross validation process is applied to inform you about the model's predictive accuracy, calibrated based on this specific dataset. Then, based on the rules developed using the training dataset, HABFUZZ predicts the habitat suitability of any given test dataset. The fuzzy rules are data-driven and automatically developed by the software. Of course, since it is open source, you can always modify the rules subroutines of HABFUZZ or even write your own expert-judgement-based rules. HABFUZZ implements two different algorithms (the classic fuzzy logic algorithm and a fuzzy rule-based Bayesian algorithm), which share a common concept: fuzziness of inputs. You can select the one, which yields the highest predictive accuarcy based on your training dataset.
The fuzzy algorithms
If you are a beginner, you can find a detailed description of the fuzzy logic algorithms in Ross et al. (2010) at http://onlinelibrary.wiley.com/book/10.1002/9781119994374.
However, the essential information you need to apply HABFUZZ is presented in the software's manual (see the documentation page) and in this short tutorial. In fuzzy logic, the initial values of the input data are converted to membership degrees based on a process called fuzzification, which resembles classification but enables the user to account for possible uncertainty in the developed fuzzy classes (which are called membership functions). For example, if the values of water depth (D) range from 0.03 m to 1.5 m you are pretty sure that D values between 0.03 m and 0.1 m can be defined as LOW, D values between 0.2 m and 0.5 m can be defined as MODERATE etc. But you are not sure if a value of 0.12 m is LOW or MODERATE. Fuzzy logic lets you numerically say 'I am 70% sure that a value of 0.12 m can be defined as LOW and 30% as MODERATE' by writing 'the membership degree of 0.12 m to the LOW fuzzy class is 0.7 and the membership degree of 0.12 m to the MODERATE class is 0.3'. This process is applied in HABFUZZ for every value of your dataset and for each variable. HABFUZZ fuzzifies all inputs into trapezoidal-shaped membership functions, with their class boundaries defined by the user (this is the only part of HABFUZZ which requires the user to intervene).
Fig. 1. Fuzzification of V and D
Step 1 - Development of the rules database
After reading the training dataset, HABFUZZ develops the rules database based on the classes observed for each microhabitat combination and its relevant habitat suitability (K). The value of each variable of each microhabitat combination is classified in one of the following classes:
Flow velocity (V) Water Depth (D) Temperature (T) Substrate type (S) Habitat suitability (K)
VERY LOW: 0 - 0.075 m/s VERY SHALLOW: 0 - 0.125 m VERY LOW: 0 - 12.5 oC BOULDERS: 0.070 HIGH: 0.8 - 1
LOW: 0.075 - 0.175 m/s SHALLOW: 0.125 - 0.325 m LOW: 12.5 - 14 oC LARGE STONES: 0.050 GOOD: 0.6 - 0.8
MODERATE: 0.175 - 0.45 m/s MODERATE: 0.325 - 0.575 m MODERATE: 14 -18 oC SMALL STONES: 0.040 MODERATE: 0.4 - 0.6
HIGH: 0.45 - 0.75 m/s DEEP: 0.575 - 0.725 m HIGH: 18 - 24 oC LARGE GRAVEL: 0.030 POOR: 0.2- 0.4
VERY HIGH: > 0.75 m/s VERY DEEP: > 0.725 m VERY HIGH: >24 oC MEDIUM GRAVEL: 0.026 BAD: 0 - 0.2
FINE GRAVEL: 0.024
SAND: 0.022
SILT: 0.020
The boundaries for each class are the same with those used for fuzzification and can be modified by the user (in the fdeclarations.m file). HABFUZZ creates a matrix with all the class combinations observed and the relevant K for each combination. From this point, HABFUZZ implements two different algorithms, (i) the typical fuzzy logic algorithm (Mamdani and Assilian, 1975) and (ii) the fuzzy rule-based Bayesian algorithm (Brookes et al., 2010).
The fuzzy logic algorithm
(Mamdani and Assilian, 1975)
In the fuzzy logic algorithm, when a microhabitat combination is observed more than once with different K, HABFUZZ calculates the overall K by (a) averaging, (b) deriving the lowest K observed or (c) by deriving the maximum K observed. The user is prompted by HABFUZZ to select one of these three scenarios (called average, worse and optimum scenario, respectively) prior to developing the rules database. The rules are then developed in the following concept:
1. The values of V, D, T, S and K of each microhabitat combination in the training dataset are classified based on the classes illustrated above. Let's assume that we have three microhabitats
(a) V = 0.2 m/s, D = 0.12 m, T = 13 oC, S = 0.070 and K = 0.65
(b) V = 0.3 m/s, D = 0.35 m, T = 19 oC, S = 0.050 and K = 0.24
(c) V = 0.32 m/s, D = 0.05 m, T = 13 oC, S = 0.070 and K = 0.45
Habfuzz classifies these values and 'translates' them as
(a) IF V is MODERATE AND D is VERY SHALLOW AND T is LOW AND S is BOULDERS THEN K is GOOD.
(b) IF V is MODERATE AND D is MODERATE AND T is HIGH AND S is LARGE STONES THEN K is POOR.
(c) IF V is MODERATE AND D is VERY SHALLOW AND T is LOW AND S is BOULDERS THEN K is MODERATE.
You can see that the combinations (a) and (c) give the same V, D, T and S classes but different K. In this case HABFUZZ applies one of the three abovementioned scenarios to calculate one K for this combination. If for example the average scenario is selected, the K would be (0.65+0.45)/2 = 0.55 (classified as MODERATE).
This process is repeated for all the microhabitat combinations available in the training dataset. Notice that a large number of rules may be developed; in HABFUZZ there are five classes of V, five classes of D, five classes of water temperature and eight classes of substrate, which equals to 5 x 5 x 5 x 8 = 1000 rules!
Step 2 - Cross validation
The model's predictive accuracy is calculated using a 10-fold cross validation process. The training dataset is divided in two parts: 90% of the data (microhabitat combinations) are randomly selected and the rules are developed for this 90% of data. Then the rules developed based on this 90% are used to predict the K for the remaining 10%. This process is repeated 10 times (random selection of 90% and predicting the remaining 10%) and finally the CCI index (correctly classified instances %) is produced to inform the user about the models predictive accuracy.
Step 3 - Prediction of K in the test dataset
To predict the K of the test dataset (microhabitat combinations with unknown K), HABFUZZ applies the following procedure FOR EACH MICROHABITAT:
1. The V, D, T and S values of each test microhabitat are fuzzified and each value is replaced by a membership degree for each fuzzy class.
2. The whole training dataset is now used to develop the rules database.
3. HABFUZZ then refers to the rules database to derive the class combinations and the relevant K for each combination.
4. The rules corresponding to each combination are applied using the AND operator (minimum) (fig. 2a). The membership degree of each input value is used. For example, if we have a V of 0.18 m/s, which corresponds to a membership degree of 0.3 in the VERY LOW class and a value of D at 0.12 m, corresponding to a membership degree of 0.8 to the SHALLOW class, then the AND operator sais that the membership degree of the K (let's say HIGH from the relevant rule) will be 0.3 (minimum value between 0.3 and 0.8).
Fig. 2a. Aggregation of inputs
5. The different K values observed along with their membership degrees are then aggregated and a defuzzification process is applied to calculate the final K value (fig. 3a). The defuzzification algortihms implemented in HABFUZZ are (i) centroid, (ii) weighted average, (iii) maximum membership and (iv) mean of maximum.
Fig. 3a. Defuzzification
A useful illustration of the processes applied in HABFUZZ is shown in the figure below. Please refere to the software's manual for detailed description of each process.
The fuzzy rule-based Bayesian algorithm
(Brookes et al., 2010)
In the fuzzy rule-based Bayesian algorithm, when a microhabitat combination is observed more than once with different K, HABFUZZ keeps a record of both observations and assigns a probability value according to the times that a different K class is observed for the same class combination. The rules are developed in the following concept:
1. The values of V, D, T, S and K of each microhabitat combination in the training dataset are classified based on the classes illustrated above. Let's assume that we have three microhabitats
(a) V = 0.2 m/s, D = 0.12 m, T = 13 oC, S = 0.070 and K = 0.65
(b) V = 0.3 m/s, D = 0.35 m, T = 19 oC, S = 0.050 and K = 0.24
(c) V = 0.32 m/s, D = 0.05 m, T = 13 oC, S = 0.070 and K = 0.45
Habfuzz classifies these values and 'translates' them as
(a) IF V is MODERATE AND D is VERY SHALLOW AND T is LOW AND S is BOULDERS THEN K is GOOD.
(b) IF V is MODERATE AND D is MODERATE AND T is HIGH AND S is LARGE STONES THEN K is POOR.
(c) IF V is MODERATE AND D is VERY SHALLOW AND T is LOW AND S is BOULDERS THEN K is GOOD.
You can see that the combinations (a) and (c) give the same V, D, T and S classes but different K. In this case HABFUZZ assigns a probability of 0.5 to the GOOD K class (from the 'a' combination) and 0.5 to the MODERATE K class (from the 'c' combination) and these probabilities are used to calculate the final K (see below).
This process is repeated for all the microhabitat combinations available in the training dataset. Notice that a large number of rules may be developed; in HABFUZZ there are five classes of V, five classes of D, five classes of water temperature and eight classes of substrate, which equals to 5 x 5 x 5 x 8 = 1000 rules!
Step 2 - Cross validation
The model's predictive accuracy is calculated using a 10-fold cross validation process. The training dataset is divided in two parts: 90% of the data (microhabitat combinations) are randomly selected and the rules are developed for this 90% of data. Then the rules developed based on this 90% are used to predict the K for the remaining 10%. This process is repeated 10 times (random selection of 90% and predicting the remaining 10%) and finally the CCI index (correctly classified instances %) is produced to inform the user about the models predictive accuracy.
Step 3 - Prediction of K in the test dataset
To predict the K values of the test dataset (microhabitat combinations with unknown K), HABFUZZ applies the following procedure FOR EACH MICROHABITAT:
1. The V, D, T and S values of each test microhabitat are fuzzified and each value is replaced by a membership degree for each fuzzy class.
2. The whole training dataset is now used to develop the rules database.
3. HABFUZZ then refers to the rules database to derive the class combinations and the relevant K for each combination.
4. The rules corresponding to each combination are applied using the membership degree of each class as the probability of occurence of this class (fig. 2b). For example, if we have a V of 0.18 m/s, which corresponds to a membership degree of 0.3 in the VERY LOW class and a value of D at 0.12 m, corresponding to a membership degree of 0.8 to the SHALLOW class, then the fuzzy Bayesian algorithm suggests that the membership degree of K (let's say HIGH with a probability of 0.5 from the relevant rule) will be 0.3 x 0.8 x 0.5 = 0.12 (the joint probability of these classes occuring in the specific combination). The Bayesian term is used since we say the probability of K being HIGH GIVEN THAT V is VERY LOW and D is SHALLOW.
Fig. 2b. Aggregation of inputs with the fuzzy Bayesian algorithm
5. The final K is calculated from the different K observed along with their probability values by applying the 'expected utility' algorithm, in which a score is assigned at each K class (HIGH: 0.9, GOOD: 0.7, MODERATE: 0.5, POOR: 0.3, BAD: 0.1) and the calculation for the example of fig. 2b goes as following:
K = 0.385 x 0.7 + 0.135 x 0.5 + 0.135 x 0.3 + 0.165 x 0.3 = 0.427 (MODERATE)
A useful illustration of the processes applied in HABFUZZ is shown in the figure below. Please refere to the software's manual for detailed description of each process.
References
Brookes C.J., Kumar V. and Lane S.N. 2010. A comparison of Fuzzy, Bayesian and Weighted Average formulations of an in-stream habitat suitability model. Proceedings of the International Congress on Environmental Modelling and Software, 5-8 Jul 2010, Ottawa, Canada. Available at
http://scholarsarchive.byu.edu/cgi/viewcontent.cgi?article=2649&context=iemssconference
Mamdani, E.H., Assilian, S., 1975. An experiment in linguistic synthesis with a fuzzy logic Controller. International Journal of Man-Machine Studies 7: 1-13.
Ross T.J. 2010. Fuzzy logic with engineering applications. Third Edition, John Wiley and Sons, UK. Available at http://onlinelibrary.wiley.com/book/10.1002/9781119994374
bottom of page