Intelligent system for Islamic prayer (salat) posture monitoring

ABSTRACT


INTRODUCTION
Salat (prayer) is one of the main pillars in Islam, which is considered one of the most important aspects of our faith. Our beloved Prophet Muhammad (peace be upon him) received the commandments of salat during Isra' and Mi'raj (the night journey). Hence, giving hope for humanity once more as they have lost the light on how to worship the true and the only one God, The Almighty Allah (Glorious is He and He is Exalted). During that time, Muslims learn how to perform salat by following the orders and actions of Prophet Muhammad (peace be upon him). This is done by looking at the action and orders directly using the senses of sight and hearing. In other words, the technique used to learn salat at that time solely by using the human senses to detect the correct movements and words. Although Muslims during that time can only learn how to perform salat through the Prophet's words and actions, the teaching and learning are highly effective because Muslims nowadays are still performing salat the same way as the Prophet.
As the world is getting older, some Muslims tend to forget the proper way to perform salat as they are bound to the world. Today, technologies are improving at a very fast pace. Many kinds of research and development have been conducted to improve our lives. This raised a big responsibility for Muslim researchers to develop a technology that benefits this world and hereafter. With this goal, developing the

221
"Intelligent Salat Monitoring and Training System" as an educational tool could help many Muslims learn and recognize the proper way of performing salat. There are a few research have been done regarding the activity monitoring of salat. Alobaid and Rasheed [1], Al-Ghannam and Dossari [2] used smartphone technology to recognize Salat activities. Rabbi et.al. [3], Ibrahim and Ahmad [4] assessed the activities of salat using electromyographic (EMG) signals. Therefore, the most important technology that we need to learn and master to perform the salat inspection and training system for the Muslim community is machine vision and image processing. Computer vision is used to inspect and track human movement in various fields, such as sport, health care and even games. As Muslim, we usually notice and understand how to perform salat by following others. We follow the postures and movements of others in performing salat mainly by scanning using our eyes. Then we analyze and process the learning of salat, whether it is correct or wrong. However, by combining this technology with the religious aspect, we will gain many advantages. Using the system, we can learn the proper way of salat by looking at the correct posture of salat installed in the database system; thus, giving the proper feedback to the user. The system feedback can be in the form of words and numbers, indicating the percentage of error in the salat movement.
Many researchers found algorithms to detect human parts, such as the face, hand, movements, and postures. Some of the algorithms can detect the posture of the human body [5]. Different algorithm leads to differences in need for the system. Therefore, some consideration must be made to have a fully functional system. Approaches and algorithms to perform the inspection and training system for salat must be chosen so that the image we need to measure and compare does not lack the information needed by the system. Elements like the angle of sight, size, color, and texture of the image need to be measured using multiple algorithms to get an accurate result so that the system does not give the wrong feedback to the user. In order to overcome the problems, the MATLAB program is used to implement and test the methods proposed.
Muslim communities and others who convert to Islam across the world are in dire need of the basic knowledge of salat. By developing the salat inspection and training system, this technology can teach and share the knowledge with ease. The system resolves several problems as: i) help Muslim across the world in learning the correct ways of salat anytime and everywhere; ii) reduce time, cost, and does not use much space for learning salat; iii) avoid from being used by fake preachers in learning salat; iv) for Muslims who feel embarrassed to learn the salat from others; and v) help newly converted Muslims to learn salat with ease.
Several movements in salat are considered important [6]. These movements and postures are needed to be in a correct manner so that Allah will accept our salat. Notably, salat has a few sequenced movements to have a complete cycle known as raka'ah. The sequences of one complete cycle are shown in Figure 1.

RELATED WORKS 2.1. Human body modelling in machine vision
Human motion and pose recognition can be categorized into two types of models, which are modelbased and appearance-based methods. Model-based object tracking algorithms are based on simple CAD (computer-aided design) wire models of objects, as shown in Figure 2. Using this kind of models, we can draw the starting and endpoint of the lines correctly into the image plane and granting a real-time tracking of objects at the cost of a small computational effort. Appearance-based method use no priori knowledge on the data present in the image. Instead, it analyzes the data by using the statistic of the available dataset in the database to extract the modes. By doing so, it will group the data in the best possible condition. According to Azad et al., the appearance-based method uses various algorithms to illustrate the object [7]. In other words, appearance-based approaches are more reliable in many types of situations because they do not require a specific object to be a model.

Representation of human figure 2.2.1. Bounding box
One of the simplest representations of the human body is the bounding box. Although the function of the bounding box is limited, the model is useful when the image of the human body in the picture is very small because it only used a few pixels. This will reduce the complexity in image processing but at the cost of accuracy. Figure 3 shows the bounding box as a human representation in human body modelling [8].

Stick representation
The stick or bone figure representation typically represents the human body in machine vision and image processing. The stick is acting as a bone and make the pose or movement of the human. Figure 4 shows the stick representation of the human body. The disadvantages of this figure are some of the movements like sitting will be difficult to make because of occlusion [9].

Multi-dimensional representation
Hand gesture, as one of the important ways for human to convey information and express intuitive intention, has the advantages of high degree of differentiation, strong flexibility and high efficiency of information transmission, which makes hand gesture recognition (HGR) as one of the research hotspots in the field of human-machine interface (HMI) [10]. The two-dimensional (2D) contour representation used the human body and projected it from three-dimensional (3D) space onto the two-dimensional image plane [11]- [13]. It will approximate the human body by using deformable contours, ribbons, or cardboards [14]. Figure 5 shows 2D images of the hand. Three-dimensional (3D) representation describes the parts of the human body in 3D space using a combination of cylinders as shown in Figure 6 [15]. The 3D representation shape can also use other shapes such as a cone or sphere to represent the human body.

Algorithm for pattern and image matching 2.3.1. Histogram of the oriented gradient
A histogram of the oriented gradient (HOG) could be used for image processing matching purposes such as face recognition [16]- [18]. The theory behind HOG measurements is distributed within the region range in the image. It is very useful for matching and tracking textured objects, which have inorganic shapes. For application of computer vision on human hands indicate HOG as the better performer compared to other feature extraction models [19]. HOG was applied to the base feature images to generate feature descriptors [20].

Hidden markov model
The hidden markov model (HMM) is very famous among speech recognition, and it is one kind of model that uses statistic to extract the features. According to Wang et al., HMM is more reliable in analyzing time-varying data with variations in space-time conditions [21]. In matching procedures, it will compute the probability of HMM to generate the test symbol and its sequences which corresponds to the features of the input image. HMM is considered one of the best algorithms in matching the human motion pattern because it can handle uncertainty or unknown in its stochastic framework [22]. However, there is a significant disadvantage of this method. The HMM is inefficient in handling three or more processes that are independent [23].

Euclidean distance
Euclidean distance can define the metric of the image efficiently. It used the Euclidean metric to measure the distance between two connected points in a straight line in Euclidean space. According to Wang et al., this method consists of the summation of the pixel-wise intensity differences [24]. They stated that the traditional Euclidean distance might cause small deformation in using a large Euclidean distance. To solve the problem, they proposed a method that can solve any reasonable metric. The keys for their method are simplicity in computation, relative insensitivity to small deformation, and increased efficiency in embedding the system in most of the powerful image recognition.

Temporal template
Bobick and Davis used temporal templates to recognize human movements by constructing a vector image to match it against the image, whereby the movement is known and stored in the database [25]. Two types of features were used, namely motion-energy image and motion-history image. There are many advantages to using these methods. They could support direct recognition of the motion, instantly perform temporal segmentation, invariant to linear changes in speed, and be run in real-time on a standard platform. Some limitations were detected, such as it cannot handle incidental motion, and occlusion may sometimes happen at a certain point.

METHOD 3.1. Actual picture and mechanical design
To design the salat inspection and training system, we used polyvinyl chloride (PVC) pipe as the base in the design. In this design, we prioritize portability first as it requires large spaces to place or store it. By using PVC pipe, we can assemble and disassemble it easily, which takes less than five minutes. In order to build the base of the system, a combination of plain ended pipe, equal elbow pipe, end cap pipe, and equal tee pipe are needed. Figure 7(a) shows the actual picture the system and the isometric view of the system is shown in Figure 7(b). In the actual picture, two black lines are drawn on the base carpet. The middle black line is for the initial position of at-tawarrok. The user will sit there until the system finished the inspection. The black line located near the back camera is for the initial position for takbiratul ihram, ruku' and sujud. The user will perform all these postures of the salat at their respective initial position, marked with the black lines. Two cameras are installed in the system as shown in Figure 7(b). One camera is installed at the front to inspect the front part of the body, such as hands and head, which are placed higher than the second one. The second camera is installed at the back to inspect the back part of the body, such as the legs, placed lower than the first one. The front camera is used to inspect the postures of salat for takbiratul ihram, ruku' and sujud, while the back camera is used to inspect the posture during at-tawarrok.
Two servo motors are installed in the system located below the cameras. The function of these servomotors is to change the camera angle when taking the video of the user's salat using the system. This system can be carried and implemented everywhere because of its unique features. In the base carpet, a forcesensing resistor is installed to inspect the user during the sujud. The force-sensing resistor is used during sujud to check whether or not the parts of the body, such as the forehead and nose, are touching the ground.

Experimental Method
In this study, we adopted a template matching approach considering its simplicity in real time application. To detect human body, the color space RGB (red, green, and blue) is chosen, then we decorrelate the luminance and chrominance. With a given RGB image, it is converted to grayscale image using the RGB-to-grayscale conversion equation.
Furthermore, the input image converted to grayscale as we need to match with the database images as template to the input image. However, to process the matching we would choose an approach. The matching process moves the template image to all possible 35 positions in a larger source image and computes a numerical index that indicates how well the template matches the image in that position. One of the well know matching process is Euclidean distance, Let I be a gray level image and g be a gray-value template of size (n×m): ( , , , ) = √ ∑ ∑ ( ( + , + ) − ( , )) 2 =1 =1 (2) where (r, c) denotes the top left corner of template g. Second matching process which has the accuracy advantage and processing time over Euclidean distance is grey-level correlation: Where, x is the template gray level image ̅ is the average gray level in the template image y is the source image section ̅ is the average gray level in the source image N is the number of pixels in the section image The value cor is between -1 t0+1, with larger values representing a strongrt relationship between the two images.
As we now the correlation matching result never shows 100% matching as the images are different in small details. Therefore, we should apply a threshold for the correlation result, the threshold can be set the highest value of match accrued. Regarding the feature extraction of this system, we considered HOG descriptors as one descriptor as shown in Figure 8 to show how the system could perform. Figure 8(a) shows the equivalent histogram of an image and the obtained HOG feature of the image is shown in Figure 8(b). However, there are many descriptors can be used or combined to work together. Other descriptor apply the same operation of coveting the image pixels to vote to its color number as it described from (0-255), 0 for white ad 255 for black color. However, Divide the feature into log-polar bins instead of dividing the feature into square is the commune used approach. To identify the image to the computer we need to use descriptors, as high as we train the system with image descriptors as the error of the system decrease. However, combining multi type of descriptors "such as scale-invariant feature transform (SIFT), gradient location orientation histogram (GLOH) and speeded up robust features (SURF)" will help to enhance the performance of the system.  Figure 9. The correct position, the rising of hand above shoulder, with it descriptor is shown in Figure 9(a). Wherease the wrong position, hand is below shoulder is shown in Figure 9(b). The descriptor of the image on the right side showing intensity variation in the images. Both HGR and HOG descriptors were used for matching the two positions as shown in Figure 10, the error of the result can be recognized by the different between two extreme points. The difference in strong corners between two overlay images will represent the amount of unmatched features or error between matching two images. However, as this difference increase as the salat position had performed by the prayer is wrong. Therefore, we need to increase the extracted feature by increasing the corners number, as well as, threshold the matching result so our system would trigger the position as wrong if the difference between two images exceeds 30%.

RESULTS AND DISCUSSION
In this part, the user will use the system's graphical user interface installed on the laptop. Figure 11 show the interface provided by the system. The user can preview the correct posture of salat by clicking the button preview for each posture. After that, they can click the blue button in the interface for the system to start the inspection. Upon clicking the start button in blue colour, the video of the person who prays using the salat inspection and training system was taken using the video camera attached to the system. Two video cameras are needed to inspect the user. One is located at the front side of the user to inspect the front parts of the body movement of the salat, such as hands during takbiratul ihram. Another camera is located at the back of the user to inspect the back parts of the body, such as the legs during the attawarrok. The video will be taken until a beep sound was heard, indicating the system has finished taking the user's picture. Figure 11. Graphical user interface (GUI) of Intelligent Salat Monitoring System Once the system finishes taking the video, it will start filtering the region of interest and undergo color conversion from RGB to grayscale. Then the image will be matched in the database using the template matching and Euclidean distance as the medium. In order to improve its accuracy, grey-level correlation is used to increase the system's performance.
If the salat is done correctly within the programmed value of the threshold, shown in the graph, a message will pop up that says "GOOD PERFORMANCE" in a green-colored text. Otherwise, it will suggest the correct postures that you should do in a red-colored text, indicating the performance of your salat is bad. This stage applied to takbiratul ihram, ruku' and at-tawarrok only. For sujud, a special graph indicates how many force readings will be shown to the user with its performance. By doing this, it can train the user to learn the salat until the correct posture is performed.
The front camera is used for taking the video during takbiratul ihram, ruku' and sujud. The system gives feedback to the user by showing the pictures and graph. Figure 12 dipicts the good takbeerat alehram. The left side is the screen shoot of the recorded video, middle graph shows the matches percentage and the notification image is shown in the right side. Figure 13 dipicts the bad performance during takbeerat alehram. The left side is the screen shoot of the recorded video, middle graph shows the matches percentage and the notification image is shown in the right side.
Performance of correct ruku' is shown in Figure 14. The left side is the screen shoot of the recorded video, middle graph shows the matches percentage and the notification image is shown in the right side. Figure 15 describes the information regarding incorrect ruku' performance. The left is the template for the incorrect postures stored in the database, bad performance of ruku' and matching percentages, which do not reach the threshold.
If the system finds a frame with 98% matches a green rectangular will appear on the region of the interest, if the matches last for more than three seconds the performance of the posture considered correct and a notification pop up. However, if the matches didn't last more than three second, the system will notify the student via a pop image and comment on the image. This is simply the feedback mechanism of the salat inspection and training system. All the results applied the same feedback concept in the system except for sujud. In sujud, additional information was added. A graph shows the reading on force-sensing resistor indicates the user is performing the sujud. Whenever the user's nose and forehead touch the sensor located at the base carpet, the force will trigger the sensor, thus giving the reading on the MATLAB. There are 12 forces reading during sujud, set as the threshold to indicate good performance of the sujud. Figure 16 show the force sensor reading to verify the performance of sujud. Figure 16(a) shows how feedback on good Sujud performance is shown to the user, while Figure 16

CONCLUSION
In conclusion, the first objective is to learn the correct postures in salat, whereby we analyze books of hadith and Muslim scholars in the literature review section. The second objective is to develop an imageprocessing algorithm using MATLAB, which is also achieved as the result show quite a good performance. The third objective of this study is to test the salat performance and provide feedback to the user. This is also achieved as the output result of the matching image pop up the message about the salat performance and train the user by giving the correct instructions regarding the current postures of the salat. The results are quite accurate, as the method proposed is able to identify and match the pattern to recognize up to 90% and inform the user about their salat performance. The reading in the graph is more accurate when the user performs salat using the system itself because the camera angle is fixed. Although the posture is correct, some results show errors when the lighting is bad. This is because the pattern matching in MATLAB is confused when the lighting is insufficient. It will affect the results of pattern matching, for example, the posture of salat is correct, but the system keeps on giving bad performance feedback to the user. This issue can be solved by using the system under sufficient light; hence, increasing the accuracy of the overall system. It is recommended to bright room to ensure clear images captured. The camera angle also needs to be fixed and constant between the database and the correct and wrong image for the system to detect the pattern without an error.