(in feet)
Preliminary: State the random variables
Let x = altitude
y = high temperature
Now plot the x values on the horizontal axis, and the y values on the vertical axis. Then set up a scale that fits the data on each axes. Once that is done, then just plot the x and y values as an ordered pair. In R, the command is:
independent variable<-c(type in data with commas in between values) dependent variable<-c(type in data with commas in between values) plot(independent variable, dependent variable, main="type in a title you want", xlab="type in a label for the horizontal axis", ylab="type in a label for the vertical axis", ylim=c(0, number above maximum y value)
For this example, that would be: elevation<-c(7000, 4000, 6000, 3000, 7000, 4500, 5000) temperature<-c(50, 60, 48, 70, 55, 55, 60) plot(elevation, temperature, main="Temperature versus Elevation", xlab="Elevation (in feet)", ylab="Temperature (in degrees F)", ylim=c(0, 80))
Looking at the graph, it appears that there is a linear relationship between temperature and elevation. It also appears to be a negative relationship, thus as elevation increases, the temperature decreases.
A time-series plot is a graph showing the data measurements in chronological order, the data being quantitative data. For example, a time-series plot is used to show profits over the last 5 years. To create a time-series plot, the time always goes on the horizontal axis, and the other variable goes on the vertical axis. Then plot the ordered pairs and connect the dots. The purpose of a time-series graph is to look for trends over time. Caution, you must realize that the trend may not continue. Just because you see an increase, doesn’t mean the increase will continue forever. As an example, prior to 2007, many people noticed that housing prices were increasing. The belief at the time was that housing prices would continue to increase. However, the housing bubble burst in 2007, and many houses lost value, and haven’t recovered.
Example \(\PageIndex{3}\) Time-series plot
The following table tracks the weight of a dieter, where the time in months is measuring how long since the person started the diet
Time (months) | 0 | 1 | 2 | 3 | 4 | 5 |
Weight (pounds) | 200 | 195 | 192 | 193 | 190 | 187 |
Make a time-series plot of this data
In R, the command would be:
variable1<-c(type in data with commas in between values, this should be the time variable) variable2<-c(type in data with commas in between values) plot(variable1, variable2, ylim=c(0,number over max), main="type in a title you want", xlab="type in a label for the horizontal axis", ylab="type in a label for the vertical axis") lines(variable1, variable2) – connects the dots
For this example: time<-c(0, 1, 2, 3, 4, 5) weight<-c(200, 195, 192, 193, 190, 187) plot(time, weight, ylim=c(0,250), main="Weight over Time", xlab="Time (Months) ", ylab="Weight (pounds)") ines(time, weight)
Notice, that over the 5 months, the weight appears to be decreasing. Though it doesn’t look like there is a large decrease.
Be careful when making a graph. If you don’t start the vertical axis at 0, then the change can look much more dramatic than it really is. As an example, Graph 2.3.8 shows the Graph 2.3.7 with a different scaling on the vertical axis. Notice the decrease in weight looks much larger than it really is.
Exercise \(\PageIndex{1}\)
80 | 79 | 89 | 74 | 73 | 67 | 79 |
93 | 70 | 70 | 76 | 88 | 83 | 73 |
81 | 79 | 80 | 85 | 79 | 80 | 79 |
58 | 93 | 94 | 74 |
67 | 67 | 76 | 47 | 85 | 70 |
87 | 76 | 80 | 72 | 84 | 98 |
84 | 64 | 65 | 82 | 81 | 81 |
88 | 74 | 87 | 83 |
Length of Metacarpal | Height of Person |
---|---|
45 | 171 |
51 | 178 |
39 | 157 |
41 | 163 |
48 | 172 |
49 | 183 |
46 | 173 |
43 | 175 |
47 | 173 |
Value | Rental | Value | Rental | Value | Rental | Value | Rental |
---|---|---|---|---|---|---|---|
81000 | 6656 | 77000 | 4576 | 75000 | 7280 | 67500 | 6864 |
95000 | 7904 | 94000 | 8736 | 90000 | 6240 | 85000 | 7072 |
121000 | 12064 | 115000 | 7904 | 110000 | 7072 | 104000 | 7904 |
135000 | 8320 | 130000 | 9776 | 126000 | 6240 | 125000 | 7904 |
145000 | 8320 | 140000 | 9568 | 140000 | 9152 | 135000 | 7488 |
165000 | 13312 | 165000 | 8528 | 155000 | 7488 | 148000 | 8320 |
178000 | 11856 | 174000 | 10400 | 170000 | 9568 | 170000 | 12688 |
200000 | 12272 | 200000 | 10608 | 194000 | 11232 | 190000 | 8320 |
214000 | 8528 | 280000 | 10400 | 200000 | 10400 | 200000 | 8320 |
240000 | 10192 | 240000 | 12064 | 240000 | 11648 | 225000 | 12480 |
289000 | 11648 | 270000 | 12896 | 262000 | 10192 | 244500 | 11232 |
325000 | 12480 | 310000 | 12480 | 303000 | 12272 | 300000 | 12480 |
Life Expectancy | Fertility Rate | Life Expectancy | Fertility rate |
---|---|---|---|
77.2 | 1.7 | 72.3 | 3.9 |
55.4 | 5.8 | 76.0 | 1.5 |
69.9 | 2.2 | 66.0 | 4.2 |
76.4 | 2.1 | 5.9 | 5.2 |
75.0 | 1.8 | 54.4 | 6.8 |
78.2 | 2.0 | 62.9 | 4.7 |
73.0 | 2.6 | 78.3 | 2.1 |
70.8 | 2.8 | 72.1 | 2.9 |
82.6 | 1.4 | 80.7 | 1.4 |
68.9 | 2.6 | 74.2 | 2.5 |
81.0 | 1.5 | 73.3 | 1.5 |
54.2 | 6.9 | 67.1 | 2.4 |
Prenatal Care (%) | Health Expenditure (% of GDP) |
---|---|
47.9 | 9.6 |
54.6 | 3.7 |
93.7 | 5.2 |
84.7 | 5.2 |
100.0 | 10.0 |
42.5 | 4.7 |
96.4 | 4.8 |
77.1 | 6.0 |
58.3 | 5.4 |
95.4 | 4.8 |
78.0 | 4.1 |
93.3 | 6.0 |
93.3 | 9.5 |
93.7 | 6.8 |
89.8 | 6.1 |
Year | 1983 | 1984 | 1985 | 1986 | 1987 | 1988 | 1989 | 1990 |
---|---|---|---|---|---|---|---|---|
Rate | 4.31 | 4.42 | 4.52 | 4.35 | 4.39 | 4.21 | 3.40 | 3.61 |
Year | 1991 | 1992 | 1993 | 1994 | 1995 | 1996 | 1997 | |
Rate | 3.67 | 3.61 | 2.98 | 2.95 | 2.72 | 2.95 | 2.3 |
Date | Assets in Billions of AUD |
---|---|
Mar-2006 | 96.9 |
Jun-2006 | 107.4 |
Sep-2006 | 107.2 |
Dec-2006 | 116.2 |
Mar-2007 | 123.7 |
Jun-2007 | 134.0 |
Sep-2007 | 123.0 |
Dec-2007 | 93.2 |
Mar-2008 | 93.7 |
Jun-2008 | 105.6 |
Sep-2008 | 101.5 |
Dec-2008 | 158.8 |
Mar-2009 | 118.7 |
Jun-2009 | 111.9 |
Sep-2009 | 87.0 |
Dec-2009 | 86.1 |
Mar-2010 | 83.4 |
Jun-2010 | 85.7 |
Sep-2010 | 74.8 |
Dec-2010 | 76.0 |
Mar-2011 | 75.7 |
Jun-2011 | 75.9 |
Sep-2011 | 75.2 |
Dec-2011 | 87.9 |
Mar-2012 | 91.0 |
Jun-2012 | 90.1 |
Sep-2012 | 83.9 |
Dec-2012 | 95.8 |
Mar-2013 | 90.5 |
Year | CPI-U-RS1 index (December 1977=100) | Year | CPI-U-RS1 index (December 1977=100) |
---|---|---|---|
1947 | 37.5 | 1980 | 127.1 |
1948 | 40.5 | 1981 | 139.2 |
1949 | 40.0 | 1982 | 147.6 |
1950 | 40.5 | 1983 | 153.9 |
1951 | 43.7 | 1984 | 160.2 |
1952 | 44.5 | 1985 | 165.7 |
1953 | 44.8 | 1986 | 168.7 |
1954 | 45.2 | 1987 | 174.4 |
1955 | 45.0 | 1988 | 180.8 |
1956 | 45.7 | 1989 | 188.6 |
1957 | 47.2 | 1990 | 198.0 |
1958 | 48.5 | 1991 | 205.1 |
1959 | 48.9 | 1992 | 210.3 |
1960 | 49.7 | 1993 | 215.5 |
1961 | 50.2 | 1994 | 220.1 |
1962 | 50.7 | 1995 | 225.4 |
1963 | 51.4 | 1996 | 231.4 |
1964 | 52.1 | 1997 | 236.4 |
1965 | 52.9 | 1998 | 239.7 |
1966 | 54.4 | 1999 | 244.7 |
1967 | 56.1 | 2000 | 252.9 |
1968 | 58.3 | 2001 | 260.0 |
1969 | 60.9 | 2002 | 264.2 |
1970 | 63.9 | 2003 | 270.1 |
1971 | 66.7 | 2004 | 277.4 |
1972 | 68.7 | 2005 | 286.7 |
1973 | 73.0 | 2006 | 296.1 |
1974 | 80.3 | 2007 | 304.5 |
1975 | 86.9 | 2008 | 316.2 |
1976 | 91.9 | 2009 | 315.0 |
1977 | 97.7 | 2010 | 320.2 |
1978 | 104.4 | 2011 | 330.3 |
1979 | 114.4 |
Year | Median Income | Year | Median Income |
---|---|---|---|
1967 | 42,056 | 1990 | 49,950 |
1968 | 43,868 | 1991 | 48,516 |
1969 | 45,499 | 1992 | 48,117 |
1970 | 45,146 | 1993 | 47,884 |
1971 | 44,707 | 1994 | 48,418 |
1972 | 46,622 | 1995 | 49,935 |
1973 | 47,563 | 1996 | 50,661 |
1974 | 46,057 | 1997 | 51,704 |
1975 | 44,851 | 1998 | 53,582 |
1976 | 45,595 | 1999 | 54,932 |
1977 | 45,884 | 2000 | 54,841 |
1978 | 47,659 | 2001 | 53,646 |
1979 | 47,527 | 2002 | 53,019 |
1980 | 46,024 | 2003 | 52,973 |
1981 | 45,260 | 2004 | 52,788 |
1982 | 45,139 | 2005 | 53,371 |
1983 | 44,823 | 2006 | 53,768 |
1984 | 46,215 | 2007 | 54,489 |
1985 | 47,079 | 2008 | 52,546 |
1986 | 48,746 | 2009 | 52,195 |
1987 | 49,358 | 2010 | 50,831 |
1988 | 49,737 | 2011 | 50,054 |
1989 | 50,624 |
See solutions
B1 assets of financial institutions. (2013, June 27). Retrieved from www.rba.gov.au/statistics/tables/xls/b01hist.xls
Benen, S. (2011, September 02). [Web log message]. Retrieved from http://www.washingtonmonthly.com/pol...edit031960.php
Capital and rental values of Auckland properties . (2013, September 26). Retrieved from http://www.statsci.org/data/oz/rentcap.html
Contraceptive use . (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...gs.aspx?ind=35
Deaths from firearms . (2013, September 26). Retrieved from http://www.statsci.org/data/oz/firearms.html
DeNavas-Walt, C., Proctor, B., & Smith, J. U.S. Department of Commerce, U.S. Census Bureau. (2012). Income, poverty, and health insurance coverage in the United States: 2011 (P60-243). Retrieved from website: www.census.gov/prod/2012pubs/p60-243.pdf
Density of people in Africa . (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...249,250,251,25 2,253,254,34227,255,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,27 2,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,294, 295,296,297,298,299,300,301,302,304,305,306,307,308
Department of Health and Human Services, ASPE. (2013). Health insurance marketplace premiums for 2014. Retrieved from website: aspe.hhs.gov/health/reports/2...b_premiumsland scape.pdf
Electricity usage . (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...s.aspx?ind=162
Fertility rate. (2013, October 14). Retrieved from http://data.worldbank.org/indicator/SP.DYN.TFRT.IN
Fuel oil usage. (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...s.aspx?ind=164
Gas usage. (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...s.aspx?ind=165
Health expenditure. (2013, October 14). Retrieved from http://data.worldbank.org/indicator/SH.XPD.TOTL.ZS Hinatov, M. U.S. Consumer Product Safety Commission, Directorate of Epidemiology. (2012). Incidents, deaths, and in-depth investigations associated with non-fire carbon monoxide from engine-driven generators and other engine-driven tools, 1999-2011. Retrieved from website: www.cpsc.gov/PageFiles/129857/cogenerators.pdf
Life expectancy at birth. (2013, October 14). Retrieved from http://data.worldbank.org/indicator/SP.DYN.LE00.IN
Median income of males. (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...s.aspx?ind=137
Median income of males. (2013, October 9). Retrieved from http://www.prb.org/DataFinder/Topic/...s.aspx?ind=136
Prediction of height from metacarpal bone length . (2013, September 26). Retrieved from http://www.statsci.org/data/general/stature.html
Pregnant woman receiving prenatal care. (2013, October 14). Retrieved from http://data.worldbank.org/indicator/SH.STA.ANVC.ZS
United States unemployment. (2013, October 14). Retrieved from http://www.tradingeconomics.com/unit...mployment-rate
Weissmann, J. (2013, March 20). A truly devastating graph on state higher education spending. The Atlantic. Retrieved from http://www.theatlantic.com/business/...ending/274199/
Many powerful approaches to data analysis communicate their findings via graphs. These are an important counterpart to data analysis approaches that communicate their findings via numbers or tabless.
Here we will illustrate some of the most common approaches for graphical data analysis. Throughout this discussion, it is important to remember that graphical data analysis methods are subject to the same principles as non-graphical methods. A graph can be either informative or misleading, just like any other type of statistical result. To understand whether a graph is informative, we should consider the following:
Every graph should provide insight into the specific research question that is the overall goal of the data analysis.
The graph is constructed using a sample of data, but the purpose of the graph is to learn about the population that the sample represents.
What statistical principal or concept is the graph based on?
What are the theoretical properties of any numerical summaries that are shown in the graph?
Almost every statistical graphic conveys a statistical concept that can be defined in a non-graphical manner. Graphs may show associations, location, dispersion, tails, conditioning, or almost any other statistical feature of the data or population. Graphs make it easier for the viewer to digest such information, but when interpreting a graph it is always important to keep in mind the specific statistical concept on which the graph is based.
Statistical graphics have an aesthetic dimension that is usually not evident when presenting findings through, say, tables. Our goal here is to focus on the content of graphs, not their aesthetic properties. Very crude graphs that have deep content are much more informative than beautiful graphs that convey only superficial content. In recent years, the field of infographics has grown rapidly. There is no sharp line dividing infographics from statistical graphs, however in general, the former tend to convey relatively simple insights in an aesthetically engaging way, while the latter aim to convey deeper and more subtle insight, with less focus on presentation.
One of the main challenges in statistical graphics is to fit the greatest amount of useful information into a single graph, while allowing the graph to remain interpretable. More complex graphs can suffer from overplotting , in which the plot elements are so crowded on the page that they fall on top of each other. This can limit the legibility of plots formed from large datasets unless a great deal of preliminary summarization of the data is performed.
Another challenge that arises in graphing complex datasets is that most graphs are two-dimensional, so that they can be viewed on a screen (or printed on paper). Some graphing techniques extend to three dimensions, but many datasets have a natural dimensionality that is much greater than 2 or 3. A few methods for graphing work around this, but require more effort from the person viewing the graph.
Boxplots are a graphical representation of the distribution of a single quantitative variable. A boxplot is based on a set of quantiles calculated using a sample of data. Below is an example of a single boxplot, drawn horizontally, showing the distribution of income values based on a sample of 100 individuals.
The “box” in a boxplot (shaded blue above) spans from the 25th to the 75th percentiles of the data, with an additional line drawn cross-wise through the box at the median. “Whiskers” extend from either end of the box, and are intended to cover the range of the data, excluding “outliers”.
The concept of an outlier is extremely problematic and no generically useful definition of outliers has been proposed. For the purpose of drawing a boxplot, the most common convention is to plot the upper (right-most) whisker at the 75th percentile plus 1.5 times the IQR, or to the greatest data value less than this quantity. Analogously, the lower (left-most) whisker is drawn at the 25th percentile minus 1.5 times the IQR, or to the least data value greater than this quantity. Finally, individual points sometimes called “fliers” are drawn corresponding to any value that falls outside the range spanned by the whiskers. A single box-plot, as above, is often drawn horizontally, but may also be drawn vertically.
There are many alternative ways of defining the locations of the whiskers in a boxplot. The approach described above is most common, and is chosen so that with “light tailed” distributions, well under 1% of the data will fall outside of of the whiskers.
The boxplot above shows a right-skewed distribution. This is evident because the upper whisker is further from the box than the lower whisker. Also, within the box, the median is closer to the lower side of the box than to the upper side of the box. Overall, the lower quantiles are more compressed, and the upper quantiles are more spread out, which is a feature of right-skewed distributions.
Boxplots are commonly used to compare distributions. A “side-by-side” or “grouped” boxplot is a collection of boxplots drawn for different subsets of data, plotted on the same axes. These subsets usually result from a stratification of the data, according to some stratifying factor that partially accounts for the heterogeneity within the population of interest. For example, below we consider boxplots showing the distribution of income, stratified by sex.
A histogram is a very familiar way to visualize quantitative data. A histogram is constructed by breaking the range of the values into bins and counting the number (or proportion) of observations that fall into each bin. The shape of a histogram shows visually how likely we are to observe data value in each part of the range. We are more likely to observe values where the histogram bars are higher, and less likely to observe values where the histogram bars are lower.
Histograms closely resemble “bar charts”, but with the added statistical aspect that the goal is to capture the density at each possible point in the population. “Density” is a measure of how commonly we observe data “near”, rather than “at” a point. For example, the density of household incomes at 45,000 USD would not be the exact number or frequency of households with this income. Instead, it reflects the frequency of households that have an income near 45,000 USD.
A histogram can be used to assess almost any property of a distribution. The common measures of location and dispersion can be judged from visual inspection of the histogram. As always, we should remember that features of the histogram may not always reflect features of the population from which the data were sampled. For example, a histogram may show two modes (i.e. is bimodal ) even when the underlying distribution only has one mode (i.e. is unimodal ). Moreover, the number of modes in a histogram can change as the bin width is varied.
Histograms are easy to communicate about, but may not be effective when working with small samples, where they can accentuate non-generalizable features of the sample (i.e. characteristics of the sample that are not present in the population). This is reflected in the following mathematical fact. For many statistics, if we wish to reduce the error relative to the population value of the statistic by a factor of two, we need to increase the sample size by a factor of four. In the case where we are aiming to estimate a density, in order to reduce the error by a factor of two, we need to increase the sample size by a factor of eight.
With a sufficiently large collection of representative data, the histogram should closely match the population’s probability density function (PDF). The PDF is usually a smooth curve, rather than a series of steps as in a histogram. This fact inspired the development of a modified version of a histogram that presents us with a smooth curve instead of a series of steps. This technique is called kernel density estimation ( KDE ). It produces graphs such as shown below.
Kernel density estimates may provide a somewhat more accurate estimation of the underlying density function compared to a histogram. But like a histogram, they can be unstable and produce artifacts. For example, the KDE above shows positive density for negative income values, even though all of the income values used to fit the KDE were positive (in some cases, income can take a negative value, but in this case no such values were present). More advanced KDE methods not used here can mitigate this issue.
One advantage of using a KDE rather than a histogram is that it is easier to overlay multiple KDEs on the same axes for comparison without too much overplotting. This might allow us to compare, say, the distributions of female and male incomes as follows.
A quantile plot is a plot of the pairs \((p, q_p)\) , where \(q_p\) is the p’th quantile of a collection of quantitative values. Since \(p\) can be any real number between 0 and 1, the graph of these pairs constitutes a function. By construction, this must be a non-decreasing function. A quantile plot contains essentially the same information as a histogram, but is represented in a very different way. Note that unlike the histogram, for which the bin width is a parameter that must be selected, there is no such parameter in the quantile plot. Arguably, the quantile plot is a more stable and informative summary of a sample, especially if the sample size is moderate. However most people are more comfortable interpreting histograms than quantile functions.
As an example, the following plot shows simulated systolic blood pressure values for a sample of females and a sample of males. In this case, at every probability point \(p\) , the blood pressure quantile for males is greater than the blood pressure quantile for females, indicating that male blood pressure is “stochastically greater” than female blood pressure.
Below is another example that shows two quantile functions, but in this case the quantile functions cross. As a result, there is no “stochastic ordering” between the data for females and for males. Also note that the quantile curve for females is steeper than the curve for males, indicating that the female blood pressure values are more dispersed than those for the males.
A quantile-quantile plot , or QQ plot , is a plot based on quantiles that is used to compare two distributions. Recall that a quantile plot plots the pairs \((p, q_p)\) for one sample. A QQ plot plots the pairs \((q^{(1)}_p, q^{(2)}_p)\) , where \(q^{(1)}_p\) are the quantiles for the first sample, and \(q^{(2)}_p\) are the quantiles for the second sample. In a QQ-plot, the value of p is “implicit” – each point on the graph corresponds to a specific value of p, but you cannot see what this value is by inspecting the graph.
As an example, suppose we are comparing the number of minutes of sleep during one night for teenagers and adults. This might give us the following QQ-plot:
The above QQ-plot shows us that teenagers tend to sleep longer than adults, and this is especially true at the upper end of the range. The QQ-plot approximately passes through the point (600, 800), meaning that for some probability p, 600 is the p’th quantile for adults and 800 is the p’th quantile for teenagers.
The slope of the curve in the QQ-plot reflects the relative levels of dispersion in the two distributions being compared. Since the slope of the curve in the above QQ-plot is greater than that of the diagonal reference line, it follows that the values plotted on the vertical axis (teenager’s values) are more dispersed than the values plotted on the horizontal axis (adult’s values).
An important property of a QQ-plot is that if the plot shows a linear relationship between the quantiles, then the two distributions are related via a location/scale transformation . That is, there is a linear function \(a + bx\) that maps one distribution to the other. In the example above, there is a substantial amount of curvature in the graph, so it does not seem to be the case that the sleep durations for adults and teenagers are related via a location/scale transformation.
Dot plots display quantitative data that are stratified into groups. One axis of the plot is used to display the quantitative measure, and the other axis is used to separate the results for different groups. A series of parallel “guide lines” are used to show which points belong to each group. Dot plots are often used to display a collection of numerical summary statistics in visual form. Sometimes people say that dot plots are used to “convert tables into graphs”. Due to overplotting, dot plots are less commonly used to show raw data. The example below shows how dot plots can be used to display the median age stratified by sex, for people living in each of eleven countries.
The plot above shows that the median age for females is greater than the median age for males in every country. This is mainly due to females having longer life expectancies than males. We also see that some countries have much lower median ages for both sexes compared to other countries. Countries that have recently had high birth rates, such as Ethiopia and Nigeria, tend to have much lower median ages than countries with lower birth rates, such as Japan.
A scatterplot is a very widely-used method for visualizing bivariate data. They have many uses, but the most relevant for us is to plot the joint (empirical) distribution of two quantitative values. As an example, suppose that we observe paired data values giving the annual minimum and annual maximum temperature at a location. We could view these data with a scatterplot, placing, say, the minimum temperature value on the horizontal (x) axis, and the maximum temperature value on the vertical (y) axis. The number of points is the sample size, here being the number of locations for which temperature data are available. A possible graph of this type is shown below.
Several characteristics of the relationship between minimum and maximum temperature are evident from the plot above. The maximum temperature at each location is at least as large as the minimum temperature. There is a positive association in which locations with a lower minimum temperature tend to have a lower maximum temperature compared to places with a higher maximum temperature, but there is a lot of scatter around this trend. Warmer places tend to have a smaller range between their minimum and maximum temperatures. Concretely, locations on the equator and at low elevation, such as Singapore, have relatively constant temperature throughout the year. Locations near the center of large continents, like Winnipeg, Canada, can have extremely cold winters and also rather hot summers. Coastal regions that are far from the equator, such as Dublin, Ireland, have mild winters and cool summers.
To aid in interpreting a scatterplot, it is useful to plot a smooth curve that runs through the center of the data. This is called scatterplot smoothing , and can be accomplished with several algorithms, one of which is known as lowess . The population analogue of a scatterplot smooth is the conditional mean , or conditional expectation , denoted \(E[Y|X=x]\) , for the conditional mean of \(Y\) given \(X\) . The conditional mean is a function of \(x\) , and can be evaluated at any point \(x\) in the domain of \(X\) . The conditional mean is (roughly speaking), the average of all values of \(Y\) whose corresponding value of \(X\) is near \(x\) .
The plot below adds the estimated conditional mean (orange curve) to the scatterplot of temperature data discussed above. The conditional mean curve is increasing, showing that, as noted above, a location with lower annual minimum temperature tends on average to have a lower annual maximum temperature (relative to other locations).
Some data have a serial structure, meaning that the values are observed with an ordering. Very often, such observations are made over time, which gives us time series or longitudinal data. Sometimes we observe a single time series over a long period of time, such as the value of a commodity in a market recorded every day over many years. Other times, we observe many short time series recorded irregularly. We may plot these time series together, leading to what is sometimes called a “spaghetti plot”. For example, in a study of human growth, we may observe measurements of the body weight of research subjects at various ages, giving us the spaghetti plot below:
Scatterplots in the plane are limited to two dimensions. Various techniques have been developed to overcome this limitation, one of which is the parallel coordinate plot . A parallel coordinate plot places the coordinate axes for the multiple dimensions as parallel lines, rather than as perpendicular lines. Using parallel lines means that data for far more than two or three variables can be placed on a single page.
Below is an example of a parallel coordinates plot, showing four attributes of a set of ten countries. A scatterplot of these points would live in four-dimensional space, which is quite challenging to visualize directly. Note that the attributes are converted to Z-scores, which is common in a parallel coordinates plot when the variables being plotted fall in very different ranges. The plot shows us that the life expectancies for females and for males are quite similar – the country with the highest life expectancy for females also has the highest life expectancy for males, and the country with the lowest life expectancy for females also has the lowest life expectancy for males. There is also a substantial positive relationship between the economic status of a country, as measured by its gross domestic product (GDP) and life expectancy. However no relationship is evident between GDP and population, or between either of the life expectancy variables and population.
The graphs above primarily use quantitative data. A mosaic plot is a plot that is used with nominal data. Specifically mosaic plots are used when the units of analysis are cross-classified according to two nominal factors. In the example below, people with cancer are cross-classified by their biological sex, and by the type of cancer that they have:
The width of each box in the mosaic plot corresponds to the relative overall prevalence of the corresponding cancer type. The heights of the boxes correspond to the sex-specific prevalences. Based on this graph, we see that digestive, lung, and breast cancers are much more common than, say, oral and endocrine cancers. The mosaic plot also shows us that while breast and endocrine cancers are more common in females, the other cancer types are more common in males.
An important property of a mosaic plot is that the area of each box is proportional to the number of units that fall into the box. Thus, we can see that the area of the female breast cancer box is larger than the the combined areas of the female and male lung cancer boxes. Thus, there are more cases of breast cancer in females than the combined cases of lung cancer for both sexes.
This article discusses musgconv, a perception-inspired graph convolution block for symbolic musical applications.
Emmanouil Karystinaios
Towards Data Science
In the field of Music Information Research (MIR), the challenge of understanding and processing musical scores has continuously been introduced to new methods and approaches. Most recently many graph-based techniques have been proposed as a way to target music understanding tasks such as voice separation, cadence detection, composer classification, and Roman numeral analysis.
This blog post covers one of my recent papers in which I introduced a new graph convolutional block, called MusGConv , designed specifically for processing music score data. MusGConv takes advantage of music perceptual principles to improve the efficiency and the performance of graph convolution in Graph Neural Networks applied to music understanding tasks.
Traditional approaches in MIR often rely on audio or symbolic representations of music. While audio captures the intensity of sound waves over time, symbolic representations like MIDI files or musical scores encode discrete musical events. Symbolic representations are particularly valuable as they provide higher-level information essential for tasks such as music analysis and generation.
However, existing techniques based on symbolic music representations often borrow from computer vision (CV) or natural language processing (NLP) methodologies. For instance, representing music as a “pianoroll” in a matrix format and treating it similarly to an image, or, representing music as a series of tokens and treating it with sequential models or transformers. These approaches, though effective, could fall short in fully capturing the complex, multi-dimensional nature of music, which includes hierarchical note relation and intricate pitch-temporal relationships. Some recent approaches have been proposed to model the musical score as a graph and apply Graph Neural Networks to solve various tasks.
The fundamental idea of GNN-based approaches to musical scores is to model a musical score as a graph where notes are the vertices and edges are built from the temporal relations between the notes. To create a graph from a musical score we can consider four types of edges (see Figure below for a visualization of the graph on the score):
A GNN can treat the graph created from the notes and these four types of relations.
MusGConv is designed to leverage music score graphs and enhance them by incorporating principles of music perception into the graph convolution process. It focuses on two fundamental dimensions of music: pitch and rhythm, considering both their relative and absolute representations.
Absolute representations refer to features that can be attributed to each note individually such as the note’s pitch or spelling, its duration or any other feature. On the other hand, relative features are computed between pairs of notes, such as the music interval between two notes, their onset difference, i.e. the time on which they occur, etc.
The importance and coexistence of the relative and absolute representations can be understood from a transpositional perspective in music. Imagine the same music content transposed. Then, the intervalic relations between notes stay the same but the pitch of each note is altered.
To fully understand the inner workings of the MusGConv convolution block it is important to first explain the principles of Message Passing.
In the context of GNNs, message passing is a process where vertices within a graph exchange information with their neighbors to update their own representations. This exchange allows each node to gather contextual information from the graph, which is then used to for predictive tasks.
The message passing process is defined by the following steps:
MusGConv alters the standard message passing process mainly by incorporating both absolute features as node features and relative musical features as edge features. This design is tailored to fit the nature of musical data.
The MusGConv convolution is defined by the following steps:
By designing the message passing mechanism in this way, MusGConv attempts to preserve the relative perceptual properties of music (such as intervals and rhythms), leading to more meaningful representations of musical data.
Should edge features are absent or deliberately not provided then MusGConv computes the edge features between two nodes as the absolute difference between their node features. The version of MusGConv with the edges features is named MusGConv(+EF) in the experiments.
To demonstrate the potential of MusGConv I discuss below the tasks and the experiments conducted in the paper. All models independent of the task are designed with the pipeline shown in the figure below. When MusGConv is employed the GNN blocks are replaced by MusGConv blocks.
I decided to apply MusGConv to four tasks: voice separation, composer classification, Roman numeral analysis, and cadence detection. Each one of these tasks presents a different taxonomy from a graph learning perspective. Voice separation is a link prediction task, composer classification is a global classification task, cadence detection is a node classification task, and Roman numeral analysis can be viewed as a subgraph classification task. Therefore we are exploring the suitability of MusGConv not only from a musical analysis perspective but through out the spectrum of graph deep learning task taxonomy.
Voice separation is the detection of individual monophonic streams within a polyphonic music excerpt. Previous methods had employed GNNs to solve this task. From a GNN perspective, voice separation can be viewed as link prediction task, i.e. for every pair of notes we predict if they are connected by an edge or not. The product the link prediction process should be a graph where consecutive notes in the same voice are ought to be connected. Then voices are the connected components of the predicted graph. I point the readers to this paper for more information on voice separation using GNNs.
For voice separation the pipeline of the above figure applies to the GNN encoder part of the architecture. The link prediction part takes place as the task specific module of the pipeline. To use MusGConv it is sufficient to replace the convolution blocks of the GNN encoder with MusGConv. This simple substitution results in more accurate prediction making less mistakes.
Since the interpretation of deep learning systems is not exactly trivial, it is not easy to pinpoint the reason for the improved performance. From a musical perspective consecutive notes in the same voice should tend to have smaller relative pitch difference. The design of MusGConv definitely outlines the pitch differences with the relative edge features. However, I would need to also say, from individual observations that music does not strictly follow any rules.
Composer classification is the process of identifying a composer based on some music excerpt. Previous GNN-based approaches for this task receive a score graph as input similarly to the pipeline shown above and then they include some global pooling layer that collapses the graph of the music excerpt to a vector. From that vector then the classification process applied where classes are the predefined composers.
Yet again, MusGConv is easy to implement by replacing the GNN convolutional blocks. In the experiments, using MusGConv was indeed very beneficial in solving this task. My intuition is that relative features in combination with the absolute give better insights to compositional style.
Roman numeral analysis is a method for harmonic analysis where chords are represented as Roman numerals. The task for predicting the Roman numerals is a fairly complex one. Previous architectures used a mixture of GNNs and Sequential models. Additionally, Roman numeral analysis is a multi-task classification problem, typically a Roman numeral is broken down to individual simpler tasks in order to reduce the class vocabulary of unique Roman numerals. Finally, the graph-based architecture of Roman numeral analysis also includes a onset contraction layer after the graph convolution that transforms the graph to an ordered sequence. This onset contraction layer, contracts groups of notes that occur at the same time and they are assigned to the same label during classification. Therefore, it can be viewed as a subgraph classification task. I would reckon that the explication of this model would merit its own post, therefore, I would suggest reading the paper for more insights.
Nevertheless, the general graph pipeline in the figure is still applicable. The sequential models together with the multitask classification process and the onset contraction module entirely belong to the task-specific box. However, replacing the Graph Convolutional Blocks with MusGConv blocks does not seem to have an effect on this task and architecture. I attribute this to the fact that the task and the model architecture are simply too complex.
Finally, let’s discuss cadence detection. Detecting cadences can be viewed as similar to detecting phrase endings and it is an important aspect of music analysis. Previous methods for cadence detection employed GNNs with an encoder-decoder GNN architecture. Each note which by now we know that also corresponds to one node in the graph is classified to being a cadence note or not. The cadence detection task includes a lot of peculiarities such as very heavy class imbalances as well as annotation ambiguities. If you are interested I would again suggest to check out this paper .
The use of MusGConv convolution in the encoder of can be beneficial for detecting cadences. I believe that the combination of relative and absolute features and the design of MusGConv can keep track of voice leading patterns that often occur around cadences.
Extensive experiments have shown that MusGConv can outperform state-of-the-art models across the aforementioned music understanding tasks. The table below summarizes the improvements:
However soulless a table can be, I would prefer not to fully get into any more details in the spirit of keeping this blog post lively and towards a discussion. Therefore, I invite you to check out the original paper for more details on the results and datasets.
MusGConv is a graph convolutional block for music. It offers a simple perception-inspired approach to graph convolution that results to performance improvement of GNNs when applied to music understanding tasks. Its simplicity is the key to its effectiveness. In some tasks, it is very beneficial is some others not so much. The inductive bias of the relative and absolute features in music is a neat trick to magically improve your GNN results but my advice is to always take it with a pinch of salt. Try out MusGConv by all means but also do not forget about all the other cool graph convolutional block possibilities.
If you are interested in trying MusGConv , the code and models are available on GitHub .
All images in this post are by the author. I would like to thank Francesco Foscarin my co-author of the original paper for his contributions to this work.
Ph.D. Student at Johannes Kepler University
Text to speech
This article concerns selected issues related to the representation of process information in graphical form to develop a comprehensive User Interface. It presents XAML Domain-Specific Language as a description of the user interface.
In this article, we continue the series dedicated to discussing selected issues related to the representation of process information in graphical form. The main goal is to address selected topics in the context of graphics, which is used as a kind of control panel for the business process. It is the third article related to GUI development. If you are interested in this topic you may be interested to check out also previous articles.
The discussion is backed by the example code gathered in the GitHub repository. To follow the discussion in this respect open in MS Visual Studio(TM) the ExDataManagement.sln solution. All examples are available in the 5.13-Juliet tag . All the examples in concern have been added to the GraphicalData folder.
In This Article,
Partial class, conversion of xaml to csharp, xaml semantics, rendering types.
Program bootstrap.
This article concerns selected issues related to the representation of process information in graphical form to develop a comprehensive User Interface. It presents XAML Domain-Specific Language as a description of the user interface. It is a contribution to Programming in Practice External Data topics. A sample program backs all topics.
An image is a composition of colored pixels. They must be composed in such a way as to represent selected process information, i.e. its state or behavior. Similarly to the case of data residing in memory, which we do not process by directly referring to their binary representation, we do not create a Graphical User Interface (GUI for short) by laboriously assembling pixels into a coherent composition. Moreover, the GUI is a dashboard controlling the process, so it must also behave dynamically, including enabling data entry and triggering commands.
In a computer-centric environment, generating such graphics requires a formal description. In this article, a dedicated domain-specific language called Extensible Application Markup Language (XAML for short) is examined. By design, it is used to describe formally what we see on the screen. A new language sounds disturbing - especially since learning this language is beyond the scope of this publication. Fortunately, in-depth knowledge of it is not required. This is not a necessary condition to understand any of the topics in concern. The main goal is to examine selected topics bounded to generating a graphical user interface based on its formal description, which we programmers can somehow integrate into the entire program.
However, how to ensure the appropriate level of abstraction, i.e. hide the details related to the rendering of the image and not lose the ability to keep it under control. As usual, for our considerations to be based on practical examples we must use a specific technology. I chose the Windows Presentation Foundation (WPF). Technology refers to the tools, techniques, and processes used to design, develop, test, and maintain software systems. This encompasses a range of elements, including programming languages, development tools, frameworks and libraries, best practice rules, patterns, and concepts. Still, I will try to ensure that we do not lose the generality of the considerations regardless of this selection. An important component of this technology is the XAML language, which we will use to achieve an appropriate level of abstraction. Hopefully, we will stay as close as possible to the practice of using the CSharp language to deploy a Graphical User Interface.
Previously we described how to use an independent Blend program while designing the UI appearance. After finishing work in Blend, we can return to creating the program text, i.e. return to Visual Studio. Blend is an independent program that can be executed using the operating system interface, including the file browser context menu. It is independent, provided that the results of its work can be uploaded to the repository as an integrated part of the entire program and the history of its changes can be tracked. This will only be possible if its output is text. This is a demand today, which must be followed without any compromise. This is the cause why graphic formats such as GIF, JPG, and PowerPoint files, to name only selected ones for determining the appearance of the GUI are generally a bad idea.
Let's see how this postulate is implemented in the proposed scenario. After returning to Visual Studio, we notice that one of the files has changed. After opening it in the editor, we see that it is a file with XML syntax, i.e. a text file, although there is a similar image. Let's close the image because we should focus on the text itself. However, it should be noted that the image-text relationship exists. Going to the folder where this file is located, we can analyze its changes. I suggest not wasting time examining the changes in the file itself. It is better to spend this time understanding the content and role of this document as a part of our program. So let's go back to Visual Studio.
Probably the first surprise is that instead of CSharp we have XML. There are at least two reasons for this.
From the point of view of graphic design, the fact that we are dealing with XML should not worry us much. All that is needed is for people who know colors and shapes to give us the generated file, which we will attach to the program and Visual Studio will do the rest. Unfortunately, this approach is too good to be true. This whole elaborate plan comes down to the fact that sooner or later - and as we can guess rather sooner - we have to start talking about integrating the image with program data and behavior, which is, what we are paid for. However, we define data, i.e. sets of allowed values and operations performed on them, using types and we need to start talking about them. Hence we must learn more about the meaning of this XML document.
Further examination of using XML documents may start by noticing a seemingly trivial fact: the XML file is coupled with another file with the extension .cs. After opening it we can recognize that it is CSharp text. Moreover, we see the word partial in the header of a class, so we must deal with a partial definition of a type. Maybe these two files create one definition. This only makes sense if the parts are written in the same language - they have the same syntax and semantics. In the case under consideration, this is not met. Here, trying to merge text documents compliant with different languages must lead to a result that is not compliant with any language. Our suspicions are confirmed because as we can see the first element of this XML file contains the class attribute and the name of the partial class that is coupled.
Therefore, we can consider it very likely to be a scenario in which a document written in compliance with a certain language based on XML syntax is converted to the CSharp language. After this, they can be merged into one unified text, creating a unified class definition as a result of merging it from two parts. As a result, we can return to the well-known world of programming in CSharp. We call this new language XAML. According to the scenario presented here, we do not need to know this language. And that would be true as long as a static image is to be created. However, we need to bring it to life, i.e. visualize the process state and the behavior, i.e. display process data, enable data editing, and respond to user commands. We can be reassured by the fact that, in addition to the XAML part, we have a part in CSharp, called code-behind. Additionally, if the compiler can convert XAML to CSharp, maybe we can write everything in CSharp right away. The answer to the question of whether it is possible not to use XAML is positive, so the temptation is great. Unfortunately, this approach is costly. Before starting the cost estimation, we need to understand where they come from, but remember that we have three options. Only Blend, only CSharp, and some combination of them.
To estimate the previously mentioned costs of converting XAML to CSharp and better understand the mechanisms of operation of the environment, we need to look at what the compiler does based on the analysis of the program text. Let's do a short analysis without going into details. In the class constructor, we will find a call to the InitializeComponent method, which - at first glance - is not present (the compiler reports an error). Anyway, let's launch the program with the breakpoint just before the InitializeComponent method. It works, so after breaking execution we can select "Step Into" from the Debug menu to enter the method. We can see that the compiler automatically generates this text, but also it does not contain a simple conversion of the XAML text to CSharp, but instead passes the path to the XAML file to the LoadComponent method.
The implementation of this method is provided by the library, but from the description we can learn that it creates all relevant objects using reflection. Reflection is a higher level of education and these are the costs. Without reflection, error-free conversion of XAML to CSharp is generally impractical or even impossible.
The syntax and semantics of XML files defined by the specification are not sufficient to explain the meaning of the document. Let's try to explain what the word Grid means in a snippet of XAML text taken from an example in the repository. From the context menu, we can go to the definition of this identifier and see that an additional tab opens with the definition of the class with the same name. There is a parameter-less constructor for this class. This allows us to guess the meaning of this XML element as follows: call the parameter-less constructor and, consequently, create and initialize an object of this class. Analyzing the subsequent elements and attributes of this XML file, we see that they refer to properties, i.e. properties of this class.
Coupling controls with data.
Let's look at an example where the TextBox control is used. Its task is to expose a text on the screen, i.e. a stream of characters. The current value, so what is on the screen, is provided via the Text property. By design, it allows reading and writing string values. The equal sign after the Text property identifier must mean: "transferring the current value to/from the selected place". We already know that the selected place must be a property of some object. The word Binding means that it is attached somehow to ActionText . Hence, the ActionText identifier is probably the name of the property defined in one of the custom types. Let's find this type using the Visual Studio context menu navigation. As we can see, it works and the property has the name as expected.
As you can notice, the navigation works, so Visual Studio has no doubts about the instance of type this property comes from. If Visual Studio knows it, I guess we should know it too. The answer to this question is in these three lines of the [MainWindow DataContext of XAML definition.
Let's start with the middle line that contains a full class name. The namespace has been replaced by the vm alias defined a few lines above. The class definition has been opened as a result of previous navigation to a property containing the text for the TextBox control. Let's consider what the class name means here. For the sake of simplicity, let's first look up the meaning of the DataContext identifier. It is the name of the property. It is of the object type. The object is the base type for all types. Since it's a property, we can read or assign a new value to it. Having discarded all the absurd propositions, it is easy to guess that the MainViewModel identifier here means a parameter-less constructor of the MainViewModel type, and this entire fragment should be recognized as the equivalent of an association statement to the DataContext property of a newly created instance of the MainViewModel type. In other words, it is equivalent to the following statement
Finally, at run-time, we can consider this object as a source and repository of process data used by the user interface. From a data point of view, it creates a kind of mirror of what is on the screen
Let's go back to the previous example with the TextBox control and couple its Text property with the ActionText property from the class whose instance reference is assigned to the DataContext property. Here, the magic word Binding may be recognized as a virtual connection that transfers values between interconnected properties. When asked how this happens and what the word Binding means, i.e. when asked about the semantics of this notation, I usually receive an answer like this "It is some magic wand, which should be read as an internal implementation of WPF", and Binding is a keyword of the XAML language. This explanation would be sufficient, although it is a colloquialism and simplification. Unfortunately, we need to understand at least when this transfer is undertaken. The answer to this question is fundamental to understanding the requirements for the classes that can be used to create an object whose reference is assigned to the DataContext property. The main goal is to keep the screen up to date. To find the answer, let's try to go to the definition of the Binding identifier using the context menu or the F12 key.
It turns out that Binding is the identifier of a class or rather a constructor of this class. This must consequently mean that at this point a magic wand means creating an instance of the Binding class that is responsible for transferring values from one property to another. The properties defined in the Binding type can be used to control how this transfer is performed. Since this object must operate on unknown types, reflection is used. This means that this mechanism is rarely analyzed in detail. The colloquial explanation previously given that the transfer is somehow carried out is quite common because it has its advantages in the context of describing the effect.
The AttachedProperty class definition simulates this reflection-based action providing the functionality of assigning a value to the indicated property of an object whose type is unknown.
Using properties defined in the Binding type, we can parameterize the transfer process and, for example, limit its direction. Operations described by the XAML text are performed once at the beginning of the program when the MainWindow instance is created. Therefore, we cannot here specify the point in time when this transfer should be carried out. To determine the point in time when an instance of the Binding type should trigger this transfer, let's look at the structure of the ActionText property in the MainViewWindow type. Here we see that the setter (used to update the current value) performs two additional methods. In the context of the main problem, the RaisePropertyChanged method is invoked. This method activates the PropertyChanged event required to implement the INotifyPropertyChanged interface.
This event is used by objects of the Binding class to invoke the current value transfer. As a result of activating this event, we call methods whose delegates have been added to the PropertyChanged event required by the mentioned interface. If the class does not implement this interface or the activation of the PropertyChanged event does not occur, the new value assigned to a property will not be pulled and transferred to the bonded property of a control. Finally, the screen will not be refreshed - the screen will be static.
It is typical communication where the MainViewWindow instance notifies about the selected value change and the MainWindow instance pulls a new value and displays it. In this kind of communication, the MainViewWindow has a publisher role, and the MainWindow is the subscriber. It is worth stressing that communication is a run-time activity. It is initiated in the opposite direction compared with the compile time types relationship. Hence we can recognize it as an inversion of control or a callback communication method.
The analysis of the previous examples shows the screen content synchronization mechanism with the property values change of classes dedicated to providing data for the GUI. Now we need to explain the sequence of operations carried out as a consequence of issuing a command by the user interface, e.g. clicking on the on-screen key - Button. We have an example here, and its Command property has been associated, as before, with something with the identifier ShowTreeViewMainWindowCommend . Using navigation in Visual Studio, we can go to the definition of this identifier and notice that it is again a property from the MainViewWindow class, but of the ICommand type. This time, this binding is not used to copy the property value but to convert a key click on the screen, e.g. using a mouse, into calling the Execute operation, which is defined in the ICommand interface and must be implemented in the class that serves to create an object and assign a reference to it into this property.
For the sake of simplicity, the ICommand interface is implemented by a helper class called RelayCommand . In the constructor of this class, you should place a delegate to the method to be called as a result of the command execution. The second constructor is helpful in dynamically changing the state of a button on the screen. This can block future events, i.e. realize a state machine. And this is exactly the scenario implemented in the examined example program. Please note the RaiseCanExecuteChanged method was omitted in the previous explanation.
It may sound mysterious at first, but the fact that the graphical user interface is an element of the program is obvious to everyone. However, it is not so obvious to everyone that it is not an integral part of the executing program process. Let's look at the diagram below, where we see the GUI as something external to the program. Like streaming and structured data. This interface can even be deployed on another physical machine. In such a case, the need for communication between machines must also be considered.
As a result, we must look at the User Interface and the running program as two independent entities operating in asynchronous environments. So the problem is how to synchronize its content and behavior with the program flow.
In object-oriented programming, launching a program must cause instantiation and initialization of a first object. Its constructor therefore contains the instruction that is first executed by the operating system process to be a platform for running the program. This raises the question of how to find it.
Each project contains a configuration file. In the project, its content can be read using the context menu. There is a place where the Startup Object may be selected. There is only one to choose from, and its name syntax resembles a type name.
Since this is an automatically generated but custom type, it is worth asking how the development environment selects types to this list. Could there be more items on this list?
Since this is the Startup Object, the identifier in the Dropbox must be the class name. We find the App type in the solution explorer tree. After opening it, we see that it is XAML-compliant text. Notice that this file is coupled with a CSharp file. This is another example of a partial class written in two languages, so we expect XAML to CSharp conversion and text merging. In this definition of the App type, we can find a reference to another XAML file, namely an assignment to the StartupUri property pointing to the MainWindow.xaml file. It contains the definition of the graphical user interface, often called a shell.
It is worth paying attention to the fact that this class inherits from the Application class. The definition of this class is practically empty, i.e. it doesn't even have a constructor, which means that the default constructor is executed, i.e. does nothing. However, this allows you to define your parameter-less constructor. You can also overwrite selected methods from the base class to adapt the behavior to the program's individual needs. We can locate the required auxiliary activities using the mentioned language constructs here before implementing business logic. A typical example is preparing the infrastructure related to program execution tracking, calling the Dispose operation for all objects that require it before the program ends, and creating additional objects related to business logic or preparing the infrastructure for dependency injection.
An image is a composition of colored pixels. They must be composed in such a way as to represent selected process information, i.e. its state and behavior. We do not create a Graphical User Interface (GUI for short) by laboriously assembling pixels into a coherent composition. Generating such graphics requires a formal description. A dedicated domain-specific language called Extensible Application Markup Language (XAML for short) is examined in this article. By design, it is used to describe formally what we see on the screen and interoperability of the user interface. This language is based on XML so the first question is why XML and what difference is between XAML and XML. The XML-based documents must be integrated with the CSharp counterpart somehow therefore based on XML syntax, semantics of XAML, and partial definitions a consistent program is generated. To make the user interface interoperable the rendered controls are bound to the process data. Last but not least addressed in this article topic is bootstrapping the application.
This article is part of a series dedicated to discussing selected issues related to representing process information in graphical form. If you are interested in this topic, you may be interested to check out previous articles listed in the "See also" section. The main goal of referring selected topics GUI, MVVM, XAML, Binding, and Communication (to name only selected ones) is to improve understanding of the user interface's design process, making the development process faster, chipper, and more portable. In other words, the mentioned technology is used only to ensure the engineering level of the discussion because the discussed topics are independent of the technology used, and similar problems are encountered.
Printing in C# Made Easy
Graph Data Structure is a non-linear data structure consisting of vertices and edges. It is useful in fields such as social network analysis, recommendation systems, and computer networks. In the field of sports data science, graph data structure can be used to analyze and understand the dynamics of team performance and player interactions on the field.
Table of Content
Components of graph data structure.
Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph is composed of a set of vertices( V ) and a set of edges( E ). The graph is denoted by G(V, E).
Imagine a game of football as a web of connections, where players are the nodes and their interactions on the field are the edges. This web of connections is exactly what a graph data structure represents, and it’s the key to unlocking insights into team performance and player dynamics in sports.
1. null graph.
A graph is known as a null graph if there are no edges in the graph.
A graph in which edges do not have any direction. That is the nodes are unordered pairs in the definition of every edge.
A graph in which edge has direction. That is the nodes are ordered pairs in the definition of every edge.
The graph in which from one node we can visit any other node in the graph is known as a connected graph.
The graph in which at least one node is not reachable from a node is known as a disconnected graph.
The graph in which the degree of every vertex is equal to K is called K regular graph.
The graph in which the graph is a cycle in itself, the degree of each vertex is 2.
A graph containing at least one cycle is known as a Cyclic graph.
A Directed Graph that does not contain any cycle.
A graph in which vertex can be divided into two sets such that vertex in each set does not contain any edge between them.
There are two ways to store a graph:
In this method, the graph is stored in the form of the 2D matrix where rows and columns denote vertices. Each entry in the matrix represents the weight of the edge between those vertices.
Below is the implementation of Graph Data Structure represented using Adjacency Matrix:
This graph is represented as a collection of linked lists. There is an array of pointer which points to the edges connected to that vertex.
Below is the implementation of Graph Data Structure represented using Adjacency List:
When the graph contains a large number of edges then it is good to store it as a matrix because only some entries in the matrix will be empty. An algorithm such as Prim’s and Dijkstra adjacency matrix is used to have less complexity.
Action | Adjacency Matrix | Adjacency List |
---|---|---|
Adding Edge | O(1) | O(1) |
Removing an edge | O(1) | O(N) |
Initializing | O(N*N) | O(N) |
Below are the basic operations on the graph:
Tree is a restricted type of Graph Data Structure, just with some more rules. Every tree will always be a graph but not all graphs will be trees. Linked List , Trees , and Heaps all are special cases of graphs.
Graph Data Structure has numerous real-life applications across various fields. Some of them are listed below:
1. what is a graph.
A graph is a data structure consisting of a set of vertices (nodes) and a set of edges that connect pairs of vertices.
Graph Data Structure can be classified into various types based on properties such as directionality of edges (directed or undirected), presence of cycles (acyclic or cyclic), and whether multiple edges between the same pair of vertices are allowed (simple or multigraph).
Graph Data Structure has numerous applications in various fields, including social networks, transportation networks, computer networks, recommendation systems, biology, chemistry, and more.
In an undirected graph, edges have no direction, meaning they represent symmetric relationships between vertices. In a directed graph (or digraph), edges have a direction, indicating a one-way relationship between vertices.
A weighted graph is a graph in which each edge is assigned a numerical weight or cost. These weights can represent distances, costs, or any other quantitative measure associated with the edges.
The degree of a vertex in a graph is the number of edges incident to that vertex. In a directed graph, the indegree of a vertex is the number of incoming edges, and the outdegree is the number of outgoing edges.
A path in a graph is a sequence of vertices connected by edges. The length of a path is the number of edges it contains.
A cycle in a graph is a path that starts and ends at the same vertex, traversing a sequence of distinct vertices and edges in between.
A spanning tree of a graph is a subgraph that is a tree and includes all the vertices of the original graph. A minimum spanning tree (MST) is a spanning tree with the minimum possible sum of edge weights.
Common graph traversal algorithms include depth-first search (DFS) and breadth-first search (BFS). These algorithms are used to explore or visit all vertices in a graph, typically starting from a specified vertex. Other algorithms, such as Dijkstra’s algorithm and Bellman-Ford algorithm, are used for shortest path finding.
Similar reads, improve your coding skills with practice.
Help | Advanced Search
Title: bounding boxes and probabilistic graphical models: video anomaly detection simplified.
Abstract: In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources. This can particularly benefit applications within video surveillance running on edge devices such as cameras. We design our model based on human reasoning which lends itself to explaining model output in human-understandable terms. Meanwhile, the slowest model trains within less than 7 seconds on a 11th Generation Intel Core i9 Processor. While our approach constitutes a drastic reduction of problem feature space in comparison with prior art, we show that this does not result in a reduction in performance: the results we report are highly competitive on the benchmark datasets CUHK Avenue and ShanghaiTech, and significantly exceed on the latest State-of-the-Art results on StreetScene, which has so far proven to be the most challenging VAD dataset.
Subjects: | Computer Vision and Pattern Recognition (cs.CV) |
Cite as: | [cs.CV] |
(or [cs.CV] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Access paper:.
Code, data and media associated with this article, recommenders and search tools.
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
The patient health prediction system is the most critical study in medical research. Several prediction models exist to predict the patient's health condition. However, a relevant result was not attained because of poor quality. The IoT-sensed data contains more noise content, which maximizes the complexity of the health prediction. These demerits resulted in low prediction and performance scores. So the proposed work aims to develop a novel Coati-based Recurrent Digital Twin Framework (CbRDTF) to predict the patients' health conditions. The novelty of this research lies in the combined function of the coati optimization and recurrent network with the digital twin for health prediction. Initially, the IoT-sensed data was imported, the data were preprocessed, and meaningful features were selected. Then predict the health condition of the patients and classify the conditions. Here the incorporated coati function at the classification layer extracted the relevant features from the sensed medical data for the robust prediction and also modified the parameters of the recurrent digital twin to improve the prediction and classification accuracy. Finally, the performance was measured; the presented model attained a high exactness score of 99.81% in the prediction accuracy, recall, f-value, and precision; it also, validated the computation time of 611.81 s.
This is a preview of subscription content, log in via an institution to check access.
Subscribe and save.
Price excludes VAT (USA) Tax calculation will be finalised during checkout.
Instant access to the full article PDF.
Rent this article via DeepDyve
Institutional subscriptions
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Kang JS, Chung K, Hong EJ (2021) Multimedia knowledge-based bridge health monitoring using digital twin. Multimed Tools Appl 80:34609–34624. https://doi.org/10.1007/s11042-021-10649-x
Article Google Scholar
Garg H, Sharma B, Shekhar S et al (2022) Spoofing detection system for e-health digital twin using EfficientNet Convolution Neural Network. Multimed Tools Appl 81:26873–26888. https://doi.org/10.1007/s11042-021-11578-5
Sengan S, Kumar K, Subramaniyaswamy V et al (2022) Cost-effective and efficient 3D human model creation and re-identification application for human digital twins. Multimed Tools Appl 81:26839–26856. https://doi.org/10.1007/s11042-021-10842-y
Khan F, Ghaffar A, Khan N, Cho SH (2020) An overview of signal processing techniques for remote health monitoring using impulse radio UWB transceiver. Sensors 20(9):2479. https://doi.org/10.3390/s20092479
Rudnicka E, Napierała P, Podfigurna A, Męczekalski B, Smolarczyk R, Grymowicz M (2020) The World Health Organization’s (WHO) approach to healthy ageing. Maturitas 139:6–11. https://doi.org/10.1016/j.maturitas.2020.05.018
Greco L, Percannella G, Ritrovato P, Tortorella F, Vento M (2020) Trends in IoT-based solutions for health care: Moving AI to the edge. Pattern Recognit Lett 135:346–353. https://doi.org/10.1016/j.patrec.2020.05.016
Sandhiya S, Palani U (2020) An effective disease prediction system using incremental feature selection and temporal convolutional neural network. J Ambient Intell Humaniz Comput 11(11):5547–5560. https://doi.org/10.1007/s12652-020-01910-6
Adeniyi EA, Ogundokun RO, Awotunde JB (2021) IoMT-based wearable body sensors network healthcare monitoring system. IoT in healthcare and ambient assisted living, Springer, Singapore, pp 103–121. https://doi.org/10.1007/978-981-15-9897-5_6
Ahmed I, Ahmad M, Jeon G, Piccialli F (2021) A framework for pandemic prediction using big data analytics. Big Data Res 25:100190. https://doi.org/10.1016/j.bdr.2021.100190
Roy S, Meena T, Lim SJ (2022) Demystifying supervised learning in healthcare 4.0: A new reality of transforming diagnostic medicine. Diagnostics 12(10):2549. https://doi.org/10.3390/diagnostics12102549
Cerchione R, Centobelli P, Riccio E, Abbate S, Oropallo E (2023) Blockchain’s coming to the hospital to digitalize healthcare services: Designing a distributed electronic health record ecosystem. Technovation 120:102480. https://doi.org/10.1016/j.technovation.2022.102480
Minerva R, Lee GM, Crespi N (2020) Digital twin in the IoT context: A survey on technical features, scenarios, and architectural models. Proc IEEE 108(10):1785–1824. https://doi.org/10.1109/JPROC.2020.2998530
Pirbhulal S, Abie H, Shukla A (2022) Towards a Novel Framework for Reinforcing Cybersecurity using Digital Twins in IoT-based Healthcare Applications. 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), IEEE. https://doi.org/10.1109/VTC2022-Spring54318.2022.9860581
Haq AU, Li J, Memon MH, Memon MH, Khan J, Marium SM (2019) Heart disease prediction system using the model of machine learning and sequential backward selection algorithm for features selection. 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), IEEE. https://doi.org/10.1109/I2CT45611.2019.9033683
McLachlan S, Dube K, Hitman GA, Fenton NE, Kyrimi E (2020) Bayesian networks in healthcare: Distribution by a medical condition. Artif Intell Med 107:101912. https://doi.org/10.1016/j.artmed.2020.101912
Kavitha M, Gnaneswar G, Dinesh R, Sai YR, Suraj RS (2021) Heart disease prediction using hybrid machine learning model. 2021 6th international conference on innovative computation technologies (ICICT), IEEE. https://doi.org/10.1109/ICICT50816.2021.9358597
Uddin MZ (2019) A wearable sensor-based activity prediction system to facilitate edge computing in intelligent healthcare systems. J Parallel Distrib Comput 123:46–53. https://doi.org/10.1016/j.jpdc.2018.08.010
Al-Shammari NK, Alzamil AA, Albadarn M, Ahmed SA, Syed MB, Alshammari AS, Gabr AM (2021) Cardiac stroke prediction framework using hybrid optimization algorithm under DNN. Eng Appl Sci Res 11(4):7436–41. https://doi.org/10.48084/etasr.4277
Mukherjee A, Ghosh S, Behere A, Ghosh SK, Buyya R (2021) Internet of Health Things (IoHT) for personalized health care using an integrated edge-fog-cloud network. J Ambient Intell Humaniz Comput 12:943–959. https://doi.org/10.1007/s12652-020-02113-9
Aceto G, Persico V, Pescapé A (2020) Industry 4.0 and health: Internet of things, big data, and cloud computing for healthcare 4.0. J Ind Inf Integr 18:100129. https://doi.org/10.1016/j.jii.2020.100129
Savitha V, Karthikeyan N, Karthik S, Sabitha R (2021) A distributed key authentication and OKM-ANFIS scheme based breast cancer prediction system in the IoT environment. J Ambient Intell Humaniz Comput 12:1757–1769. https://doi.org/10.1007/s12652-020-02249-8
Ali F, El-Sappagh S, Islam SR, Kwak D, Ali A, Imran M, Kwak KS (2020) An intelligent healthcare monitoring system for heart disease prediction based on deep ensemble learning and feature fusion. Inf Fusion 63:208–222. https://doi.org/10.1016/j.inffus.2020.06.008
Chatrati SP, Hossain G, Goyal A, Bhan A, Bhattacharya S, Gaurav D, Tiwari SM (2022) Smart home health monitoring system for predicting type 2 diabetes and hypertension. J King Saud Univ - Comput Inf Sci 34(3):862–870. https://doi.org/10.1016/j.jksuci.2020.01.010
Jackins V, Vimal S, Kaliappan M, Lee MY (2021) AI-based smart prediction of clinical disease using random forest classifier and Naive Bayes. J Supercomput 77:5198–5219. https://doi.org/10.1007/s11227-020-03481-x
Lakshmanaprabu SK, Mohanty SN, Krishnamoorthy S, Uthayakumar J, Shankar K (2019) Online clinical decision support system using optimal deep neural networks. Appl Soft Comput 81:105487. https://doi.org/10.1016/j.asoc.2019.105487
Huifeng W, Kadry SN, Raj ED (2020) Continuous health monitoring of sportspersons using IoT devices-based wearable technology. Comput Commun 160:588–595. https://doi.org/10.1016/j.comcom.2020.04.025
Morid MA, Sheng OR, Kawamoto K, Abdelrahman S (2020) Learning hidden patterns from patient multivariate time series data using convolutional neural networks: A case study of healthcare cost prediction. J Biomed Inform 111:103565. https://doi.org/10.1016/j.jbi.2020.103565
Khan MA (2020) An IoT framework for heart disease prediction based on MDCNN classifier. IEEE Access 8:34717–34727. https://doi.org/10.1109/ACCESS.2020.2974687
Elayan H, Aloqaily M, Guizani M (2021) Digital twin for intelligent context-aware IoT healthcare systems. IEEE Internet Things J 8(23):16749–16757. https://doi.org/10.1109/JIOT.2021.3051158
Abilkaiyrkyzy A, Laamarti F, Hamdi M, El Saddik A (2024) Dialogue System for Early Mental Illness Detection: Towards a Digital Twin Solution. IEEE Access 12:2007–2024. https://doi.org/10.1109/ACCESS.2023.3348783
Haleem A, Javaid M, Singh RP, Suman R (2023) Exploring the revolution in healthcare systems through the applications of digital twin technology. Biomed Technol 4:28–38. https://doi.org/10.1016/j.bmt.2023.02.001
Chakshu NK, Sazonov I, Nithiarasu P (2021) Towards enabling a cardiovascular digital twin for human systemic circulation using inverse analysis. Biomech Model Mechanobiol 20(2):449–465. https://doi.org/10.1007/s10237-020-01393-6
Chen J, Wang W, Fang B, Liu Y, Yu K, Leung VCM, Hu X (2023) Digital twin empowered wireless healthcare monitoring for smart home. IEEE J Sel Areas Commun 41(11):3662–3676. https://doi.org/10.1109/JSAC.2023.3310097
Roccetti M (2023) Predictive health intelligence: Potential, limitations and sense making. Math Biosci Eng 20(6):10459–10463. https://doi.org/10.3934/mbe.2023460
Alinsaif S (2024) Unraveling Arrhythmias with Graph-Based Analysis: A Survey of the MIT-BIH Database. Computation 12(2):21. https://doi.org/10.3390/computation12020021
Download references
Author information, authors and affiliations.
Department of Computer Science and Engineering, V.R. Siddhartha Engineering College, Vijayawada, 520007, Andhra Pradesh, India
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, 522302, Andhra Pradesh, India
Smitha Chowdary Ch
Department of Computer Science and Engineering, Dhanekula Institute of Engineering & Technology, Vijayawada, 521139, Andhra Pradesh, India
Sowmya Koneru
Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India
G. Krishna Mohan
Department of IT, Vasireddy Venkatadri Institute of Technology, Guntur, Andhra Pradesh, 522508, India
K. Kranthi Kumar
You can also search for this author in PubMed Google Scholar
Correspondence to M. Sobhana .
Ethical approval.
All applicable institutional and/or national guidelines for the care and use of animals were followed.
For this type of analysis formal consent is not needed.
The authors declare that they have no potential conflict of interest.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Reprints and permissions
Sobhana, M., Ch, S.C., Koneru, S. et al. Enhancement of patient's health prediction system in a graphical representation using digital twin technology. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19759-8
Download citation
Received : 03 July 2023
Revised : 08 April 2024
Accepted : 23 June 2024
Published : 10 July 2024
DOI : https://doi.org/10.1007/s11042-024-19759-8
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Scientific Reports volume 14 , Article number: 15537 ( 2024 ) Cite this article
319 Accesses
Metrics details
Crop yield production could be enhanced for agricultural growth if various plant nutrition deficiencies, and diseases are identified and detected at early stages. Hence, continuous health monitoring of plant is very crucial for handling plant stress. The deep learning methods have proven its superior performances in the automated detection of plant diseases and nutrition deficiencies from visual symptoms in leaves. This article proposes a new deep learning method for plant nutrition deficiencies and disease classification using a graph convolutional network (GNN), added upon a base convolutional neural network (CNN). Sometimes, a global feature descriptor might fail to capture the vital region of a diseased leaf, which causes inaccurate classification of disease. To address this issue, regional feature learning is crucial for a holistic feature aggregation. In this work, region-based feature summarization at multi-scales is explored using spatial pyramidal pooling for discriminative feature representation. Furthermore, a GCN is developed to capacitate learning of finer details for classifying plant diseases and insufficiency of nutrients. The proposed method, called P lant N utrition Deficiency and D isease Net work (PND-Net), has been evaluated on two public datasets for nutrition deficiency, and two for disease classification using four backbone CNNs. The best classification performances of the proposed PND-Net are as follows: (a) 90.00% Banana and 90.54% Coffee nutrition deficiency; and (b) 96.18% Potato diseases and 84.30% on PlantDoc datasets using Xception backbone. Furthermore, additional experiments have been carried out for generalization, and the proposed method has achieved state-of-the-art performances on two public datasets, namely the Breast Cancer Histopathology Image Classification (BreakHis 40 \(\times \) : 95.50%, and BreakHis 100 \(\times \) : 96.79% accuracy) and Single cells in Pap smear images for cervical cancer classification (SIPaKMeD: 99.18% accuracy). Also, the proposed method has been evaluated using five-fold cross validation and achieved improved performances on these datasets. Clearly, the proposed PND-Net effectively boosts the performances of automated health analysis of various plants in real and intricate field environments, implying PND-Net’s aptness for agricultural growth as well as human cancer classification.
Introduction.
Agricultural production plays a crucial role in the sustainable economic and societal growth of a country. High-quality crop yield production is essential for satisfying global food demands and better health. However, several key factors, such as environmental barriers, pollution, and climate change, adversely affect crop yield and quality. Nevertheless, poor soil-nutrition management causes severe plant stress, leading to different diseases and resulting in a substantial financial loss. Thus, plant nutrition diagnosis and disease detection at an early stage is of utmost importance for overall health monitoring of plants 1 . Nutrition management in agriculture is a decisive task for maintaining the growth of plants. In recent times, it has been witnessed the success of machine learning (ML) techniques for developing decision support systems over traditional manual supervision of agricultural yield. Moreover, nutrient management is critical for improving production growth, focusing on a robust and low-cost solution. Intelligent automated systems based on ML effectively build more accurate predictive models, which are relevant for improving agricultural production.
Nutrient deficiency in plants exhibits certain visual symptoms and may cause of poor crop yields 2 . Diagnosis of plant nutrient inadequacy using deep learning and related intelligent methods is an emerging area in precision agriculture and plant pathology 3 . Automated detection and classification of nutrient deficiencies using computer vision and artificial intelligence have been studied in the recent literature 4 , 5 , 6 , 7 , 8 . Diagnosis of nutrient deficiencies in various plants (e.g., rice, banana, guava, palm oil, apple, lettuce, etc.) is vital, because soil ingredients often can not provide the nutrients as required for the growth of plants 9 , 10 , 11 , 12 . Also, early stage detection of leaf diseases (e.g., potato, rice, cucumber, etc.) and pests are essential to monitor crop yield production 13 . A few approaches on disease detection and nutrient deficiencies in rice leaves have been developed and studied in recent times 14 , 15 , 16 , 16 , 17 , 18 . Hence, monitoring plant health, disease, and nutrition inadequacy could be a challenging image classification problem in artificial intelligence (AI) and machine learning (ML) 19 .
This paper proposes a deep learning method for plant health diagnosis by integrating a graph convolutional network (GCN) upon a backbone deep convolutional neural network (CNN). The complementary discriminatory features of different local regions of input leaf images are aggregated into a holistic representation for plant nutrition and disease classification. The GCNs were originally developed for semi-supervised node classification 20 . Over time, several variations of GCNs have been developed for graph structured data 21 . Furthermore, GCN is effective for message propagation for image and video data in various applications. In this direction, several works have been developed for image recognition using GCN 22 , 23 . However, little research attention has been given to adopting GCN especially for plant disease prediction and nutrition monitoring 24 . Thus, in this work, we have studied the effectiveness of GCN in solving the current problem of plant health analysis regarding nutrition deficiency and disease classification of several categories of plants.
The proposed method, called Plant Nutrition Deficiency and Disease Network (PND-Net), attempts to establish a correlation between different regions of the leaves for identifying infected and defective regions at multiple granularities. For this intent, region pooling in local contexts and spatial pooling in a pyramidal structure, have been explored for a holistic feature representation of subtle discrimination of plant health conditions. Other existing approaches have built the graph-based correlation directly upon the CNN features, but they have often failed to capture finer descriptions of the input data. In this work, we have integrated two different feature pooling techniques for generating node features of the graph. As a result, this mixing enables an enhanced feature representation which is further improved by graph layer activations in the hidden layers in the GCN. The effectiveness of the proposed strategy has been analysed with rigorous experiments on two plant nutrition deficiency and two plant disease classification datasets. In addition, the method has been tested on two different human cancer classification tasks for the generalization of the method. The key contributions of this work are:
A deep learning method, called PND-Net, is devised by integrating a graph convolutional module upon a base CNN to enhance the feature representation for improving the classification performances of unhealthy leaves.
A combination of fixed-size region-based pooling with multi-scale spatial pyramid pooling progressively enhances the feature aggregation for building a spatial relation between the regions via the neighborhood nodes of a spatial graph structure.
Experimental studies have been carried out for validating the proposed method on four public datasets, which have been tested for plant disease classification, and nutrition deficiency classification. For generalization of the proposed method, a few experiments have been conducted on the cervical cancer cell (SIPaKMeD) and breast cancer histopathology image (BreakHis 40 \(\times \) and 100 \(\times \) ) datasets. The proposed PND-Net has achieved state-of-the-art performances on these six public datasets of different categories.
The rest of this paper is organized as follows: “ Related works ” summarizes related works. “ Proposed method ” describes the proposed methodology. The experimental results are showcased in “ Results and performance analysis ”, followed by the conclusion in “ Conclusion ”.
Several works have been contributed to plant disease detection, most of which were tested on controlled datasets, acquired in a laboratory set-up. Only a few works have developed unconstrained datasets considering realistic field conditions, which have been studied in this work. Here, a precise study of recent works has been briefed.
Bananas are one of the widely consumed staple foods across the world. An image dataset depicting the visual deficiency symptoms of eight essential nutrients, namely, boron, calcium, iron, potassium, manganese, magnesium, sulphur and zinc has been developed 25 . This dataset has been tested in this proposed work. The CoLeaf dataset contains images of coffee plant leaves and is tested for nutritional deficiencies recognition and classification 26 . The nutritional status of oil palm leaves, particularly the status of chlorophyll and macro-nutrients (e.g., N, K, Ca, and Mg) in the leaves from proximal multi spectral images, have been evaluated using machine learning techniques 27 . The identification and categorization of common macro-nutrient (e.g., nitrogen, phosphorus, potassium, etc.) deficiencies in rice plants has been addressed 17 , 28 . The percentage of micro nutrients deficiencies in rice plants using CNNs and Random Forest (RF) has been estimated 28 . Detection of biotic stressed rice leaves and abiotic stressed leaves caused by NPK (Nitrogen, Phosphorus, and Potassium) deficiencies have been experimented with using CNN 29 .
A supervised monitoring system of tomato leaves for predicting nutrient deficiencies using a CNN for recognizing and to classify the type of nutrient deficiency in tomato plants and achieved 86.57% accuracy 30 . The nutrient deficiency symptoms have been recognized in RGB images by using CNN-based (e.g., EfficientNet) transfer learning on orange with 98.52% accuracy and sugar beet with 98.65% accuracy 31 . Nutrient deficiencies in rice plants have reported 97.0% accuracy by combining CNN and reinforcement learning 32 . The R-CNN object detector has achieved accuracy of 82.61% for identifying nutrient deficiencies in chili leaves 33 . Feature aggregation schemes by combining the features with HSV and RGB for color, GLCM and LBP for texture, and Hu moments and centroid distance for shapes have been examined for nutrient deficiency identification in chili plants 34 . However, this method performed the best using a CNN with 97.76% accuracy. An ensemble of CNNs has reported 98.46% accuracy for detecting groundnut plant leaf images 35 . An intelligent robotic system with a wireless control to monitor the nutrition essentials of spinach plants in the greenhouse has been evaluated with 86% precision 36 . The nutrient status and health conditions of the Romaine Lettuce plants in a hydroponic setup using a CNN have been tested with 90% accuracy 37 . The identification and categorization of common macro-nutrient (e.g., nitrogen, phosphorus, potassium, etc.) deficiencies in rice plants using pixel ratio analysis in HSV color space has been evaluated with more than 90% accuracy 17 . A method for estimating leaf nutrient concentrations of citrus trees using unmanned aerial vehicle (UAV) multi-spectral images has been developed and tested by a gradient-boosting regression tree model with moderate precision 38 .
The classification of healthy and diseased citrus leaf images using a (CNN) on the Platform as a Service (PaaS) cloud has been developed. The method has been tested using pre-trained backbones and proposed CNN, and attained 98.0% accuracy and 99.0% F1-score 39 . A modified transfer learning (TL) method using three pre-trained CNN has been tested for potato leaf disease detection and the DensNet169 has achieved 99.0% accuracy 40 . Likewise, a CNN-based transfer learning method has been adapted for detecting powdery mildew disease with 98.0% accuracy in bell pepper leaves 41 , and woody fruit leaves with 85.90% accuracy 42 . A two-stage transfer learning method has combined Faster-RCNN for leaf detection and CNN for maize plant disease recognition in a natural environment and obtained 99.70% F1-score 43 . A hybrid model integrating a CNN and random forest (RF) for multi-classifying rice hispa disease into distinct intensity levels 44 . A method of multi-classification of rice hispa illness has attained accuracy of 97.46% using CNN and RF 44 . An improved YOLOv5 network has been developed for cucumber leaf diseases and pest detection and reported 73.8% precision 13 . A fusion of VGG16 and AlexNet architecture has attained 95.82% testing accuracy for pepper leaf disease classification 45 . Likewise, the disease classification of black pepper has gained 99.67% accuracy using ResNet-18 46 . A ConvNeXt with an attention module, namely CBAM-ConvNeXt has improved the performance with 85.42% accuracy for classifying soybean leaf disease 47 . A channel extension residual structure with an adaptive channel attention mechanism and a bidirectional information fusion block for leaf disease classification 48 . This technique has brought off 99.82% accuracy on the plantvillage dataset. A smartphone application has been developed for detecting habanero plant disease and obtained 98.79% accuracy 49 . In addition, an ensemble method for crop monitoring system to identify plant diseases at the early stages using IoT enabled system has been presented with the best precision of 84.6% 50 . A dataset comprising five types of disorders of apple orchards has been developed, and the best accuracy is 97.3%, which has been tested using CNN 51 . A lightweight model using ViT structure has been developed for rice leaf disease classification and attained 91.70% F1-score 52 .
Though several deep learning approaches have been developed for plant health analysis yet, little progress has been achieved using GCN for visual recognition of plant diseases 53 . The SR-GNN integrates relation-aware feature representation leveraging context-aware attention with the GCN module 22 . Cervical cell classification methods have been developed by exploring the potential correlations of clusters through GCN 54 and feature rank analysis 55 . On the other side, fusion of multiple CNNs, transfer learning and other deep learning methods have been developed for early detection of breast cancer 56 . This fusion method has achieved F1 score of 99.0% on ultrasound breast cancer dataset using VGG-16. In this work, a GCN-based method has been developed by capturing the regional importance of local contextual features in solving plant disease recognition and human cancer image classification challenges.
The proposed method, called PND-Net, combines deep features using CNN and GCN in an end-to-end pipeline as shown in Fig. 1 . Firstly, a backbone CNN computes high-level deep features from input images. Then, a GCN is included upon the CNN for refining deep features using region-based pooling and pyramid pooling strategies for capturing finer details of contextual regions as multiple scales. Finally, a precise feature map is built for improving the performance.
Proposed GCN-based method, PND-Net for visual classification of plant disease and nutrition inadequacy.
GCNs have widely been used for several domains and applications such as node classification, edge attribute modeling, citation networks, knowledge graphs, and several other tasks through graph-based representation. A GCN could be formulated by stacking multiple graph convolutional layers with non-linearity upon traditional convolutional layers, i.e., CNN. In practice, this kind of stacking of GCN layers at a deeper level of a network enhances the model’s learning capacity. Moreover, graph convolutional layers are effective for alleviating overfitting issue and can address the vanishing gradient problem by adopting the normalization trick, which is a foundation of modeling GCN. A widely used multi-layer GCN algorithm was proposed by Kiff and Welling 20 , which has been adopted here. It explores an efficient and fast layer-wise propagation method relying on the first-order approximation of spectral convolutions on graph structures. It is scalable and apposite for semi-supervised node classification from graph-based data. A linear formulation of a GCN could be simplified which, in turn, is capable of parameter optimization at each layer by convolution with filter \(g_{\theta }\) and \(\theta \) parameters, which can further be optimized with a single parameter. Here, a simplified graph convolution has been concisely defined 20 .
The graph Laplacian ( \(\Psi \) ) could further be normalized to mitigate the vanishing gradients within a network.
where the binary adjacency matrix \(\tilde{\textbf{A}}=\textbf{A}+\textbf{I}_{{P}}\) denotes \(\textbf{A}\) with self-connections and \(\textbf{I}_{{P}}\) is the identity matrix, and degree matrix is \(\tilde{\textbf{D}}_{ii}= \sum _{j}^{\hspace{0.2 cm}} \tilde{\textbf{A}}_{ij}\) , and X is an input data/signal to the graph. The simplified convoluted signal matrix \(\Omega \) is given as
where input features \(X \in \mathbb {R}^{{P}\times {C}}\) , filter parameters \(\Theta \in \mathbb {R}^{{C}\times {F}}\) , and \(\Omega \in \mathbb {R}^{{P}\times {F}}\) is the convoluted signal matrix. Here, P is the number of nodes, C is the input channels, F is the filters/feature maps. Now, this form of graph convolution (Eq. 3 ) is applied to address the current problem and is described in “namerefsec33”.
A standard backbone CNN is used for deep feature extraction from an input leaf image, denoted with the class label \(I_l\) \(\in \) \(\mathbb {R}^{h\times w\times 3}\) is passed through a base CNN for extracting the feature map, denoted as \(\textbf{F}\) \(\in \) \(\mathbb {R}^{h\times w\times C}\) where h , w , and C imply the height, width, and channels, respectively. However, the squeezed high-level feature map is not suitable for describing local non-overlapping regions. Hence, the output base feature map is spatially up-sampled to \(\textbf{F}\) \(\in \) \(\mathbb {R}^{H\times W\times C}\) and \(\omega \) number of distinct small regions are computed, given as \(\textbf{F}\) \(\in \) \(\mathbb {R}^{\omega \times h\times w\times C}\) . These regions represents complementary information at different spatial contexts. However, due to fixed dimensions of regions, the importance of each region is uniformly distributed, which could be tuned further for extracting more distinguishable information. A simple pooling technique could further be applied at multiple scales for enhancing the spatial feature representation. For this intent, the region-pooled feature vectors are reshaped to convert them into an aggregated spatial feature space upon which multi-scale pyramidal pooling is possible. In addition, this kind of feature representation captures overall spatiality to understand the informative features holistically and solve the current problem.
The SPP layer was originally introduced to alleviate the fixed-length input constraints of conventional deep networks, which effectively boosted the model’s performance 57 . Generally, a SPP layer is added upon the last convolutional layer of a backbone CNN. This pooling layer generates a fixed-length feature vector and afterward passes the feature map to a fully connected or classification layer. The SPP enhances feature aggregation capability at a deeper layer of a network. Most importantly, SPP applies multi-level spatial bins for pooling while preserving the spatial relevance of the feature map. It provides a robust solution through performance enhancement of diverse computer vision problems, including plant/leaf image recognition.
A typical region polling technique loses its spatial information while passing though a global average pooling (GAP) layer for making compatible with and plugging in the GCN. As a result, a region pooling with a GAP layer aggressively eliminates informativeness of regions and their correlation, and thus often it fails to build an effective feature vector. Also, the inter-region interactions are ignored with a GAP layer upon only region-based pooling. Therefore, it is essential to correlate the inter-region interactions for selecting essential features, which could further be enriched and propagated through the GCN layer activations.
Our objective is to utilize the advantage of multi-level pooling at different pyramid levels of \(n \times n\) bins on the top of fixed-size regions of the input image. As a result, the spatial relationships between different image regions are preserved, thereby escalating the learning capacity of the proposed PND-Net. The input feature space prior to pyramid pooling is given as \(\textbf{F}^{\omega \times (HW)\times C}\) , which has been derived from \(\textbf{F}^{\omega \times H\times W\times C}\) . It enables the selection of contextual features of neighboring regions (i.e., inter-regions) through pyramid pooling simultaneously. This little adjustment in the spatial dimension of input features prior to pooling captures the interactions between the local regions of input leaf disease. Experimental results reflect that pyramidal pooling indeed elevates image classification accuracy gain over region pooling only.
where \(\delta _i\) and \(\delta _j\) define the window sizes, which enable to pool a total of \(P=(i\times i) + (j\times j)\) feature maps after SPP, given as \(\textbf{F}^{P\times C}\) . These feature maps are further fed into a GCN module, described next. The key components of proposed method are pictorially ideated in Fig. 1 .
A graph \(G=({P},E)\) , with P nodes and E edges, is constructed for feature propagation. A GCN is applied for building a spatial relation between the features through graph G . The nodes are characterized by deep feature maps, and the output \(\textbf{C}\) with the convoluted features per node. The edges E are described by an un-directed adjacency matrix \(\textbf{A} \in \mathbb {R}^{{P}\times {P}}\) for representing node-level interactions. This graph convolution has been applied to \(F_{SPP}\) (i.e., \(\textbf{F}^{P\times C}\) ), described above. The layer-wise feature propagation rule is defined as:
\(l=0, 1, \dots , L-1\) is the number of layers, \(\textbf{W}^{(l)}\) is a weight matrix for the l -th layer. A non-linear activation function ( e.g. , ReLU) is denoted by \(\sigma (.)\) . The symmetrically normalized adjacency matrix is \(\hat{\tilde{\textbf{A}}}=Q\tilde{\textbf{A}}Q; \) and \(Q=\tilde{\textbf{D}}^{-1/2}\) is the diagonal node degree matrix of \(\tilde{\textbf{A}}\) (defined in Eq. 3 ). Next, the reshaped convolutional feature map \({\textbf {F}}\) is fed into two layers of graph convolutions, subsequently which is capable of capturing local neighborhoods via the non-linear activations of rectified linear unit (ReLU) in the graph convolutional layers. The dimension of the output feature maps remains the same input of GCN layers, i.e., \( \textbf{G}^{(L)}\rightarrow {\textbf {F}}_{G}\) \(\in \mathbb {R}^{{P}\times {C}}\) . However, the node features could be squeezed to a lower dimension, which may lose essential information pertinent to spatial modeling. Hence, the channel dimension is kept uniform within the network pipeline in our study. Afterward, the graph-based transformed feature maps ( \({\textbf {F}}_{G}\) ) are pooled using a GAP for selecting the most discriminative channel-wise feature maps of the nodes.
Sample images of banana dataset showing the nutrition deficiency of iron, calcium, and magnesium.
Sample images of coffee nutrition deficiency of boron, manganese, and nitrogen.
Generally, regularization is a standard way to tackle the training-related challenges of any network, such as overfitting. Here, the layer normalization and dropout layers are interposed for handling overfitting issues as a regularization technique. Lastly, \({F}_{final}\) is passes through a softmax layer for computing the output probability of the predicted class-label \(\bar{b}\) , corresponding to the actual-label \(b \in Y\) of object classes Y .
The categorical cross-entropy loss function ( \(\mathscr {L}_{CE}\) ) and the stochastic gradient descent (SGD) optimizer with \(10^{-3}\) learning rate has been chosen for experiments.
where \(Y_i\) is the actual class label and \(log\hat{Y}_i\) is the predicted class label by using softmax activation function \(\sigma (.)\) in the classification layer, and N is the total number of classes.
At first, the implementation description is provided, followed by a summary of datasets. The experiments have been conducted using conventional classification and cross validation methods. The performances are evaluated using the standard well-known metrics: accuracy, precision, recall, and F1-score (Eq. 8 ).
where TP is the number of true positive, TN is the number of true negative, FP is the number of false positive, and FN is the number of false negative. However, accuracy is not a good assessment metric when the data distributions among the classes are imbalanced. To overcome such misleading evaluation, the precision and recall are useful metrics, based on which F1-score is measured. These three metrics are widely used for evaluating the predictive performance when classes are imbalanced. In addition, we have evaluated the performance using confusion matrix which provides a reliable performance assessment of our model. The performances have been compared with existing methods, discussed below.
A concise description about the model development regarding the hardware resources, software implementation data distribution, evaluation protocols, and related details are furnished below for easier understanding.
The Inception-V3, Xception, ResNet-50, and MobileNet-V2 backbone CNNs with pre-trained ImageNet weights are used for convolutional feature computation from the input images. The Inception module focuses on increasing network depth using 5 \(\times \) 5, 3 \(\times \) 3, and 1 \(\times \) 1 convolutions 58 . Again, 5 \(\times \) 5 convolution has been replaced by factorizing into 3 \(\times \) 3 filter sizes 59 . Afterward, the Inception module is further decoupled the channel-wise and spatial correlations by point-wise and depth-wise separable convolutions, which are the building block of Xception architecture 60 . The separable convolution follows the depth-wise convolution for spatial (3 \(\times \) 3 filters) and point-wise convolution (1 \(\times \) 1 filters) for cross-channel aggregation into a single feature map. The Xception is a three-fold architecture developed with depth-wise separable convolution layers with residual connections. Whereas, the residual connection a.k.a. shortcut connection is the central idea of deep residual learning framework, widely known as ResNet architecture 61 . The residual learning represents an identity mapping through a shortcut connection following simple addition of feature maps of previous layers rendered using 3 \(\times \) 3 and 1 \(\times \) 1 convolutions. This identity mapping does not incur additional computational overhead and still able to ease degradation problem. In a similar fashion, the MobileNet-V2 uses bottleneck separable convolutions with kernel size 3 \(\times \) 3, and inverted residual connection 62 . It is a memory-efficient framework suitable for mobile devices.
These backbones are widely used in existing works on diverse image classification problems (e.g., human activity recognition, object classification, disease prediction, etc.) due to their superior architectural designs 63 , 64 at reasonable computational cost. Here, these backbones are used for a fair performance comparison with the state-of-the-art methods developed for plant nutrition and disease classification 65 . We have customized the top-layers of base CNNs for adding the GCN module without alerting their inherent layer-wise building blocks, convolutional design such as the kernel-sizes, skip-connections, output feature dimension, and other design parameters. The basic characteristics of these backbone CNNs are briefed in Table 1 . The network depth, model size and parameters have been increased due to the addition of GCN layers upon the base CNN accordingly, evident in Table 1 .
Two GCN layers have been used with ReLU activation, and the feature size is the same as the base CNN’s output channel dimension. For example, the size of channel features of ResNet-50, Xception and Inception-V3 is 2048, which is kept the same dimension as GCN’s channel feature map. The adjacency matrix is developed considering overall spatial relation among different neighborhood regions as a complete graph. Therefore, each region is related with all other regions even if they are far apart which is helpful in capturing long-distant feature interactions and building a holistic feature representation via a complete graph structure. Batch normalization and a drop-out rate of 0.3 is applied in the overall network design to reduce overfitting.
The basic pre-processing technique provided by the Keras applications for each backbone has been applied. It is required to convert the input images from RGB to BGR, and then each color channel is zero-centered with respect to the ImageNet dataset, without any scaling. Data augmentation methods such as random rotation (± 25 \(\circ \) ), scaling (± 0.25), Gaussian blur, and random cropping with 224 \(\times \) 224 image-size from the input size of 256 \(\times \) 256 are applied on-the-fly for data diversity in image samples.
We have maintained the same train-test split provided with the datasets e.g., PlantDoc. However, other plant datasets does not provide any specific image distribution. Thus, we have randomly divided the datasets into train and test samples following a 70:30 split ratio which is complied in several works. The details of image distribution is provided in Table 3 . For cross-validation, we have randomly divided the training samples into training and validation set with a 4:1 ratio i.e., five-fold cross validation in a disjoint manner, which is a standard techniques adopted in other methods 66 . The test set remains unaltered for both evaluation schemes for clear performance comparison. Finally, the average test accuracy of five executions on each dataset has been reported here as the overall performance of the PND-Net.
A summary of the implementation specification indicating the hardware and software environments, training hyper-parameters, data augmentations, and estimated time (milliseconds) of training and inference are specified in Table 2 . Our model is trained with a mini-batch size of 12 for 150 epochs and divided by 5 after 100 epochs. However, no other criterion such as early stopping has been followed. The proposed method is developed in Tensorflow 2.x using Python.
Sample images of potato diseases infected by bacteria, pest, and nematodes.
Sample images of infected leaves of soybean, tomato, and bell pepper from the PlantDoc dataset.
The summary of four plant datasets used in this work are summarized in Table 3 . These datasets are collected from public repositories such as the Mendeley Data and Kaggle.
The Banana nutrition deficiency dataset represents healthy samples and the visual symptoms of deficiency of the: Boron, Calcium, Iron, Magnesium, Potassium, Sulphur, and Zinc. The samples of this dataset are shown in Fig. 2 . More details are provided in Ref 25 .
The Coffee nutrition deficiency dataset (CoLeaf-DB) 26 represents healthy samples and the deficiency classes are: Boron, Calcium, Iron, Manganese, Magnesium, Nitrogen, Potassium, Phosphorus, and more deficiencies. The samples of dataset are illustrated in Fig. 3 .
The Potato disease classes are: Virus, Phytopthora, Pest, Nematode, Fungi, Bacteria, and healthy. The samples of this dataset are shown in Fig. 4 . The dataset is collected from the Mendeley 67 repository.
The PlantDoc is a realistic plant disease dataset 65 , comprising with different disease classes of Apple, Tomato, Potato, Strawberry, Soybean, Raspberry, Grapes, Corn, Bell-pepper, and others. Examples are shown in Fig. 5 .
The Breast Cancer Histopathology Image Classification (BreakHis) 68 dataset with 40 \(\times \) and 100 \(\times \) magnifications contain 8-classes: adenosis, fibroadenoma, phyllodes tumor, and tubular adenoma; ductal carcinoma, lobular carcinoma, mucinous carcinoma, and papillary carcinoma. The samples of this dataset are exemplified in Fig. 6 .
The SIPaKMeD 69 , containing 4050 single-cell images, which is useful for classifying cervical cells in Pap smear images, shown in Fig. 7 . This dataset is categorized into five classes based on cytomorphological features.
Sample images of the BreakHis-40 \(\times \) dataset.
Sample images of the SIPaKMeD dataset.
A summary of the datasets with data distribution, and the baseline accuracy (%) achieved by aforesaid base CNNs are briefed in Table 3 . The baseline model is developed using the pre-trained CNN backbones with ImageNet weights. A backbone CNN extracts the base output feature map which is pooled by a global average pooling layer and classified with a softmax layer. Four backbone CNNs with different design characteristics have used for generalizing our proposed method. The baseline accuracies are reasonable and consistent across various datasets, evident in Table 3 .
Two different evaluation strategies i.e., general classification and k -fold cross validation ( \(k=5\) ) have been experimented. An average performance has been estimated from multiple executions on each dataset and reported here. The top-1 accuracies (%) of the proposed PND-Net comprising two-GCN layers with the feature dimension 2048, included on the top of different backbone CNNs, are given in Table 4 . The overall performance of the PND-Net on all datasets significantly improved over the baselines. Clearly, it shows the efficiency of the proposed method. In addition, the PND-Net model has been tested with five-fold cross validation for a robust performance analysis (“Fivefold cross validation experiments”). These cross-validation results (Tables 5 , 6 and 7 ) on each dataset could be considered as the benchmark performances using several metrics. Our method has driven the state-of-the-art performances on these datasets for plant disease and nutrition deficiency recognition.
An experimental study has been carried out on two more public datasets for human medical image analysis. The BreakHis with 40 \(\times \) and 100 \(\times \) magnifications 68 and SIPaKMeD 69 datasets have been evaluated for generalization. The SIPaKMeD dataset 69 is useful for classifying cervical cells in pap smear images, illustrated in Fig. 7 . This dataset is categorized into five classes based on cytomorphological features using the proposed PND-Net. The conventional classification results are given in Table 4 , and the performances of cross validations are provided in Tables 6 and 7 .
The fivefold cross-validation experiments on various datasets have been conducted for evaluating the performance of PND-Net using the ResNet-50 and Xception backbones, and the results are given in Table 5 , 6 , and 7 . The actual train set is divided into five disjoint subsets of images for each dataset. In each experiment, four out of five subsets are used for training and the remaining one is validated independently. Finally, the average validation result of five folds is reported.
The results of five-fold cross validation on potato leaf disease dataset are provided in Table 5 . The numbers of potato leaf images in each fold for training, validation, and testing are 1608, 402, and 869, respectively. The results using different metrics are computed and the last row implies an average performance of cross validation on this dataset.
Likewise, the performance five-fold validation on the BreakHis-40 \(\times \) dataset has been presented in Table 6 . In this experiment, the number of training samples in each fold is 1280 images, and validation set containing remaining 320 images. The test set contains 400 images which remains the same as used in aforesaid other experiments. Each of the five-fold experiment has been validated and tested on the test set. Lastly, an average result of five-fold cross validation has been computed, and given in the last row of Table 6 .
A similar experimental set-up of five-fold cross validation has been followed for other datasets. The average performances of PND-Net on these datasets are provided in Table 7 . The average cross-validation results are better than the conventional classification approach on the potato disease (ResNet-50: 94.48%) and BreakHis-40 \(\times \) (ResNet-50: 97.10%) datasets. The reason could be the variations in the validation set in each fold enhances the learning capacity of model due to training data diversity. As a result, improved performances have been achieved on diverse datasets. The results are consistent with the results of conventional classification method on other datasets as described above. The overall performances on different datasets validates the generalization capability of the proposed PND-Net.
Confusion matrices have been computed using the proposed PND-Net with ResNet-50 backbone on: ( a ) top-row: PlantDoc; ( b ) bottom-row: potato, coffee, and banana datasets.
Confusion matrix on the BreakHis-40 \(\times \) dataset (left) and smear PAP cell dataset (right) using the proposed PND-Net built upon the Xception backbone.
The t-SNE plots on the Potato leaf dataset using PND-Net with ResNet-50 (left) and Inception-V3 (right).
The Grad-CAM output of various datasets are shown, from left to right: nutrition deficiency, potato and corn diseases, and breast cancer. The top-row shows an original image and its corresponding Grad-CAM image is shown in the bottom row.
The model parameters are computed in millions, as provided in Table 8 . The model parameters have been estimated for three cases: (a) baseline i.e., the backbone CNN only; and the output feature dimension of GCN layers is (b) 1024 and (c) 2028. An average computational time of PND-Net using ResNet-50 has been estimated. The training time is 15.4 ms per image, and inference time is 5.8 ms per image, and model size is 122MB (given in Table 2 ). The confusion matrices on these four plant datasets are shown in Fig. 8 , indicating an overall performance using ResNet-50. Also, the feature map distributions are clearly shown in different clusters in the t-SNE diagrams 70 represented with two backbone models on the potato leaf dataset, shown in Fig. 10 . The gradient-weighted class activation mapping (Grad-CAM) 71 has been illustrated in Fig. 11 for visual explanations which clearly show the discriminative regions of different images.
The highest accuracy on Banana nutrition classification was 78.76% and 87.89% using the raw dataset and an augmented version of the original dataset 72 . In contrast, our method has attained 84.0% using lightweight MobileNet-V2 and the best 90.0% using ResNet-50 on the raw dataset, implying a significant improvement in accuracy on this dataset.
The performances of PND-Net on the Coleaf-DB (Coffee dataset) are very similar, and the best accuracy (90.54%) is attained by the Xception. The differences of performances with other base CNNs are very small, implying a consistent performance. The elementary result using ResNet-50 reported on this recent public dataset is 87.75% 26 . Thus, our method has set new benchmark results on Coleaf-DB for further enhancement in the future. Likewise, the Potato Leaf Disease dataset is a new one 67 , collected from Mendeley data source. We are the first to provide in-depth results on this realistic dataset acquired in an uncontrolled environment.
A deep learning method has attained 81.53% accuracy using Xception and 78.34% accuracy using Inception-V3 backbone on the PlantDoc dataset 73 . In contrast, our PND-Net has attained 84.30% accuracy using Xception and 81.0% using Inception-V3, respectively. It evinces that PND-Net is more effective in discriminating plant diseases compared to the best reported existing methods. Clearly, the proposed graph-based network (PND-Net) is capable of distinguishing different types of nutrition deficiencies and plant diseases with a higher success rate in real-world public datasets.
The BreakHis dataset has been studied for categorizing into 4-classes and binary classification in several existing works. However, we have compared it with the works of classifying into 8 categories at the image-level for a fair comparison. The top-1 accuracy attained using Xception is 94.83%, whereas the state-of-the-art accuracy on this dataset is 93.40±1.8% achieved using a hybrid harmonization technique 74 . The accuracy reported is 92.8 ±2.1% using a class structure-based deep CNN 75 . The cross-validation results (ResNet-50: 97.10%) are improved over existing methods.
Several deep learning methods have been experimented with the SIPaKMeD dataset. A CNN-based method achieved 95.35 ± 0.42% accuracy 69 , a PCA-based technique obtained 97.87% accuracy for 5-class classification 76 , 98.30% using Xception 77 , and 98.26% using DarkNet-based exemplar pyramid deep model 78 . A GCN-based method has reported 98.37± 0.57% accuracy 54 . A few more comparative results have been studied in Ref 79 . In contrast, our method has achieved 98.98 ± 0.20% accuracy and 99.10% test accuracy with cross validation using Xception backbone on this dataset. The confusion matrices on both human disease datasets are shown in Fig. 9 . Overall rigorous experimental results imply that the proposed method has achieved state-of-the-art performances on different types of datasets representing plant nutrition deficiency, plant disease, and human disease classification.
An in-depth ablation study has been carried out to observe the efficacy of key components of the PND-Net. Firstly, the significance of computing different local regions is studied. These fixed-size regional descriptors are combined to create for a holistic representation of feature maps over the baseline features. Notably, the region pooling technique has improved overall performances on all the datasets, e.g., the gain is more than 12% on the Banana nutrition deficiency dataset using ResNet-50 backbone. The results of this study are provided in Table 9 .
Afterward, a component-level study has been evaluated by removing a module from the proposed PND-Net to observe the influence of the key component in performance. An ablation study depicting the significance of spatial pyramid pooling (SPP) layer has been conducted, and the results are shown in Table 10 . As the selection of discriminatory information at multiple pyramidal structures has been avoided, the model might overlook finer details which could have been captured at multiple scales by the SPP layer. It causes an obvious degradation of the capacity of network architecture, which is evident from the performances. Thus, capturing multi-scale features is useful to select relevant features for effective learning of plant health conditions.
Next, the efficacious GCN modules are excluded from the network architecture, and then, experiments have been conducted with regional features selected by our composite pooling modules (i.e., regions + SPP) from upsampled high-level deep features of a base CNN. The results are provided in Table 11 .
( a ) The performances of various formulations of the numbers of regions and spatial pyramid pooling feature vectors; ( b ) the performances of different channel-wise node features within GCN layers activation and propagation in the proposed method using the ResNet-50 backbone.
It is evident that the GCN module indeed improves performance remarkably. In the case of the Banana dataset using Xception backbone, the accuracy of PND-Net is 89.25%. Whereas, averting GCN layers, the degraded accuracy is 81.46%, implying 7.79% drop in accuracy. Even though, one GCN layer (Banana: 86.0%) does not suffice to render the state-of-the-art performance on these plant datasets. The results of considering one layer GCN on all datasets are demonstrated in Table 12 . Indeed, two layers in GCN are beneficial in enhancing the performance over one GCN layer, which is evident in the literature 22 . Hence, two GCN layers are included in the proposed PND-Net model architecture.
A comparative study on different number of regions and the number of pyramid pooled feature vectors using ResNet-50 is shown in Fig. 12 a, which clearly implies a gradual improvement in accuracy on the PlantDoc and Banana datasets. Lastly, the influences of different feature vector sizes in GCN layer activations have been studied. In this study, the channel dimensions of feature vectors 1024 and 2048 have been chosen for building the graph structures using ResNet-50 backbone, implying the same channel dimensions have been considered in the PND-Net architecture. The results (Fig. 12 b of such variations provide insightful implications about the performance of GCN layers.
The performances of PND-Net with GCN output feature vector size of 1024 have been summarized in Table 13 . The results are very competitive with GCN’s size of 2048. Thus, the model with 1024 GCN feature size could be preferred considering a trade off between the model parametric capacity with the performance. The detailed experimental studies imply overall performance boost on all datasets, and the proposed PND-Net achieves state-of-the-art results. In addition, new public datasets have been benchmarked for further enhancement.
However, other categories of images such as high resolution, hyperspectral, etc. have not been evaluated. One reason is unavailability of such plant datasets for public research. Also, data modalities such as soil-sensor information could be utilized for developing fusion based approaches. Several existing ensemble methods have used multiple backbones, which suffer from a higher computational complexity. Though, our method performs better than several existing works, yet, the computational complexity regarding model parameters and size of PND-Net could be improved. The reason is plugging the GCN module upon the backbone CNN, which incurs more parameters. To address this challenge, the graph convolutional layer could be simplified for reducing the model complexity. In addition, more realistic agricultural datasets representing field conditions such as occlusion, cluttered backgrounds, lighting variations, and others could be developed. These limitations of the proposed PND-Net will be explored in the near future.
In this paper, a deep network called PND-Net has been proposed for plant nutrition deficiency recognition using a GCN module, which is added on the top a CNN backbone. The performances have been evaluated on four image datasets representing the plant nutrition deficiencies and leaf diseases. These datasets have recently been introduced publicly for assessment. The network has been generalized by building the deep network using four standard backbone CNNs, and the network architecture has been improved by incorporating pyramid pooling over region-pooled feature maps and feature propagation via a GCN. We are the first to evaluate these nutrition inadequacy datasets for monitoring plant health and growth. Our method has attained the state-of-the-art performance on the PlantDoc dataset for plant disease recognition. We encourage the researcher for further enhancement on these public datasets for early stage detection of plant abnormalities, essential for sustainable agricultural growth. Furthermore, experiments have been conducted on the BreakHis (40 \(\times \) and 100 \(\times \) magnifications) and SIPaKMeD datasets, which are suitable for human health diagnosis. The proposed PND-Net have attained enhanced performances on these datasets too. In the future, new deep learning methods would be developed for early stage disease detection of plants and health monitoring with balanced nutrition using other data modalities and imaging techniques.
The six datasets that support the findings which were used in this work are available using the given links. The Nutrient Deficient of Banana Plant dataset 25 is collected from https://data.mendeley.com/datasets/7vpdrbdkd4/1 . The CoLeaf-DB dataset 26 for coffee leaf nutrition deficiency classification is available at https://data.mendeley.com/datasets/brfgw46wzb/1 . The Potato Leaf Disease Dataset 67 is available at https://data.mendeley.com/datasets/ptz377bwb8/1 . The PlantDoc dataset 65 is available at https://github.com/pratikkayal/PlantDoc-Dataset . The BreakHis dataset 68 is available at https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/ , and can also be downloaded from https://data.mendeley.com/datasets/jxwvdwhpc2/1 . The original SIPaKMeD dataset 69 can be found at https://www.cs.uoi.gr/marina/sipakmed.html , and Kaggle https://www.kaggle.com/datasets/mohaliy2016/papsinglecell .
Jung, M. et al. Construction of deep learning-based disease detection model in plants. Sci. Rep. 13 , 7331 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Aiswarya, J., Mariammal, K. & Veerappan, K. Plant nutrient deficiency detection and classification-a review. In 2023 5th International Conference Inventive Research in Computing Applications (ICIRCA) . 796–802 (IEEE, 2023).
Yan, Q., Lin, X., Gong, W., Wu, C. & Chen, Y. Nutrient deficiency diagnosis of plants based on transfer learning and lightweight convolutional neural networks Mobilenetv3-large. In Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition . 26–33 (2022).
Sudhakar, M. & Priya, R. Computer vision based machine learning and deep learning approaches for identification of nutrient deficiency in crops: A survey. Nat. Environ. Pollut. Technol. 22 (2023).
Noon, S. K., Amjad, M., Qureshi, M. A. & Mannan, A. Use of deep learning techniques for identification of plant leaf stresses: A review. Sustain. Comput. Inform. Syst. 28 , 100443 (2020).
Google Scholar
Waheed, H. et al. Deep learning based disease, pest pattern and nutritional deficiency detection system for “Zingiberaceae’’ crop. Agriculture 12 , 742 (2022).
Article Google Scholar
Barbedo, J. G. A. Detection of nutrition deficiencies in plants using proximal images and machine learning: A review. Comput. Electron. Agric. 162 , 482–492 (2019).
Shadrach, F. D., Kandasamy, G., Neelakandan, S. & Lingaiah, T. B. Optimal transfer learning based nutrient deficiency classification model in ridge gourd ( Luffa acutangula ). Sci. Rep. 13 , 14108 (2023).
Sathyavani, R., JaganMohan, K. & Kalaavathi, B. Classification of nutrient deficiencies in rice crop using DenseNet-BC. Mater. Today Proc. 56 , 1783–1789 (2022).
Article CAS Google Scholar
Haris, S., Sai, K. S., Rani, N. S. et al. Nutrient deficiency detection in mobile captured guava plants using light weight deep convolutional neural networks. In 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) . 1190–1193 (IEEE, 2023).
Munir, S., Seminar, K. B., Sukoco, H. et al. The application of smart and precision agriculture (SPA) for measuring leaf nitrogen content of oil palm in peat soil areas. In 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE) . 650–655 (IEEE, 2023).
Lu, J., Peng, K., Wang, Q. & Sun, C. Lettuce plant trace-element-deficiency symptom identification via machine vision methods. Agriculture 13 , 1614 (2023).
Omer, S. M., Ghafoor, K. Z. & Askar, S. K. Lightweight improved YOLOv5 model for cucumber leaf disease and pest detection based on deep learning. In Signal, Image and Video Processing . 1–14 (2023).
Kumar, A. & Bhowmik, B. Automated rice leaf disease diagnosis using CNNs. In 2023 IEEE Region 10 Symposium (TENSYMP) . 1–6 (IEEE, 2023).
Senjaliya, H. et al. A comparative study on the modern deep learning architectures for predicting nutritional deficiency in rice plants. In 2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET) . 1–6 (IEEE, 2023).
Ennaji, O., Vergutz, L. & El Allali, A. Machine learning in nutrient management: A review. Artif. Intell. Agric. (2023).
Rathnayake, D., Kumarasinghe, K., Rajapaksha, R. & Katuwawala, N. Green insight: A novel approach to detecting and classifying macro nutrient deficiencies in paddy leaves. In 2023 8th International Conference Information Technology Research (ICITR) . 1–6 (IEEE, 2023).
Asaari, M. S. M., Shamsudin, S. & Wen, L. J. Detection of plant stress condition with deep learning based detection models. In 2023 International Conference on Energy, Power, Environment, Control, and Computing (ICEPECC) . 1–5 (IEEE, 2023).
Tavanapong, W. et al. Artificial intelligence for colonoscopy: Past, present, and future. IEEE J. Biomed. Health Inform. 26 , 3950–3965 (2022).
Article PubMed PubMed Central Google Scholar
Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. In International Conference on Learning Representations (2017).
Zhang, S., Tong, H., Xu, J. & Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 6 , 1–23 (2019).
Bera, A., Wharton, Z., Liu, Y., Bessis, N. & Behera, A. SR-GNN: Spatial relation-aware graph neural network for fine-grained image categorization. IEEE Trans. Image Process. 31 , 6017–6031 (2022).
Article ADS Google Scholar
Qu, Z., Yao, T., Liu, X. & Wang, G. A graph convolutional network based on univariate neurodegeneration biomarker for Alzheimer’s disease diagnosis. IEEE J. Transl. Eng. Health Med. (2023).
Khlifi, M. K., Boulila, W. & Farah, I. R. Graph-based deep learning techniques for remote sensing applications: Techniques, taxonomy, and applications—A comprehensive review. Comput. Sci. Rev. 50 , 100596 (2023).
Article MathSciNet Google Scholar
Sunitha, P., Uma, B., Channakeshava, S. & Babu, S. A fully labelled image dataset of banana leaves deficient in nutrients. Data Brief 48 , 109155 (2023).
Tuesta-Monteza, V. A., Mejia-Cabrera, H. I. & Arcila-Diaz, J. CoLeaf-DB: Peruvian coffee leaf images dataset for coffee leaf nutritional deficiencies detection and classification. Data Brief 48 , 109226 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chungcharoen, T. et al. Machine learning-based prediction of nutritional status in oil palm leaves using proximal multispectral images. Comput. Electron. Agric. 198 , 107019 (2022).
Bhavya, T., Seggam, R. & Jatoth, R. K. Fertilizer recommendation for rice crop based on NPK nutrient deficiency using deep neural networks and random forest algorithm. In 2023 3rd International Conference on Artificial Intelligence and Signal Processing (AISP) . 1–5 (IEEE, 2023).
Dey, B., Haque, M. M. U., Khatun, R. & Ahmed, R. Comparative performance of four CNN-based deep learning variants in detecting Hispa pest, two fungal diseases, and npk deficiency symptoms of rice ( Oryza sativa ). Comput. Electron. Agric. 202 , 107340 (2022).
Cevallos, C., Ponce, H., Moya-Albor, E. & Brieva, J. Vision-based analysis on leaves of tomato crops for classifying nutrient deficiency using convolutional neural networks. In 2020 International Joint Conference on Neural Networks (IJCNN) . 1–7 (IEEE, 2020).
Espejo-Garcia, B., Malounas, I., Mylonas, N., Kasimati, A. & Fountas, S. Using Efficientnet and transfer learning for image-based diagnosis of nutrient deficiencies. Comput. Electron. Agric. 196 , 106868 (2022).
Wang, C., Ye, Y., Tian, Y. & Yu, Z. Classification of nutrient deficiency in rice based on cnn model with reinforcement learning augmentation. In 2021 International Symposium on Artificial Intelligence and its Application on Media (ISAIAM) . 107–111 (IEEE, 2021).
Bahtiar, A. R., Santoso, A. J., Juhariah, J. et al. Deep learning detected nutrient deficiency in chili plant. In 2020 8th International Conference on Information and Communication Technology (ICoICT) . 1–4 (IEEE, 2020).
Rahadiyan, D., Hartati, S., Nugroho, A. P. et al. Feature aggregation for nutrient deficiency identification in chili based on machine learning. Artif. Intell. Agric. (2023).
Aishwarya, M. & Reddy, P. Ensemble of CNN models for classification of groundnut plant leaf disease detection. Smart Agric. Technol. 100362 (2023).
Nadafzadeh, M. et al. Design, fabrication and evaluation of a robot for plant nutrient monitoring in greenhouse (case study: iron nutrient in spinach). Comput. Electron. Agric. 217 , 108579 (2024).
Desiderio, J. M. H., Tenorio, A. J. F. & Manlises, C. O. Health classification system of romaine lettuce plants in hydroponic setup using convolutional neural networks (CNN). In 2022 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET) . 1–6 (IEEE, 2022).
Costa, L., Kunwar, S., Ampatzidis, Y. & Albrecht, U. Determining leaf nutrient concentrations in citrus trees using UAV imagery and machine learning. Precis. Agric. 1–22 (2022).
Lanjewar, M. G. & Parab, J. S. CNN and transfer learning methods with augmentation for citrus leaf diseases detection using PaaS cloud on mobile. Multimed. Tools Appl. 1–26 (2023).
Lanjewar, M. G., Morajkar, P. P. Modified transfer learning frameworks to identify potato leaf diseases. Multimed. Tools Appl. 1–23 (2023).
Dissanayake, A. et al. Detection of diseases and nutrition in bell pepper. In 2023 5th International Conference on Advancements in Computing (ICAC) . 286–291 (IEEE, 2023).
Wu, Z., Jiang, F. & Cao, R. Research on recognition method of leaf diseases of woody fruit plants based on transfer learning. Sci. Rep. 12 , 15385 (2022).
Liu, H., Lv, H., Li, J., Liu, Y. & Deng, L. Research on maize disease identification methods in complex environments based on cascade networks and two-stage transfer learning. Sci. Rep. 12 , 18914 (2022).
Kukreja, V., Sharma, R., Vats, S. & Manwal, M. DeepLeaf: Revolutionizing rice disease detection and classification using convolutional neural networks and random forest hybrid model. In 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT) . 1–6 (IEEE, 2023).
Bezabih, Y. A., Salau, A. O., Abuhayi, B. M., Mussa, A. A. & Ayalew, A. M. CPD-CCNN: Classification of pepper disease using a concatenation of convolutional neural network models. Sci. Rep. 13 , 15581 (2023).
Article ADS CAS Google Scholar
Kini, A. S., Prema, K. & Pai, S. N. Early stage black pepper leaf disease prediction based on transfer learning using convnets. Sci. Rep. 14 , 1404 (2024).
Wu, Q. et al. A classification method for soybean leaf diseases based on an improved convnext model. Sci. Rep. 13 , 19141 (2023).
Ma, X., Chen, W. & Xu, Y. ERCP-Net: A channel extension residual structure and adaptive channel attention mechanism for plant leaf disease classification network. Sci. Rep. 14 , 4221 (2024).
Babatunde, R. S. et al. A novel smartphone application for early detection of habanero disease. Sci. Rep. 14 , 1423 (2024).
Nagasubramanian, G. et al. Ensemble classification and IoT-based pattern recognition for crop disease monitoring system. IEEE Internet Things J. 8 , 12847–12854 (2021).
Nachtigall, L. G., Araujo, R. M. & Nachtigall, G. R. Classification of apple tree disorders using convolutional neural networks. In 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) . 472–476 (IEEE, 2016).
Borhani, Y., Khoramdel, J. & Najafi, E. A deep learning based approach for automated plant disease classification using vision transformer. Sci. Rep. 12 , 11554 (2022).
Aishwarya, M. & Reddy, A. P. Dataset of groundnut plant leaf images for classification and detection. Data Brief 48 , 109185 (2023).
Shi, J. et al. Cervical cell classification with graph convolutional network. Comput. Methods Prog. Biomed. 198 , 105807 (2021).
Fahad, N. M., Azam, S., Montaha, S. & Mukta, M. S. H. Enhancing cervical cancer diagnosis with graph convolution network: AI-powered segmentation, feature analysis, and classification for early detection. Multimed. Tools Appl. 1–25 (2024).
Lanjewar, M. G., Panchbhai, K. G. & Patle, L. B. Fusion of transfer learning models with LSTM for detection of breast cancer using ultrasound images. Comput. Biol. Med. 169 , 107914 (2024).
Article CAS PubMed Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37 , 1904–1916 (2015).
Article PubMed Google Scholar
Szegedy, C. et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1–9 (2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2818–2826 (2016).
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In IEEE Conference on Computer Vision Pattern Recognition . 1251–1258 (2017).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition . 770–778 (2016).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetv2: Inverted residuals and linear bottlenecks. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition . 4510–4520 (2018).
Bera, A., Nasipuri, M., Krejcar, O. & Bhattacharjee, D. Fine-grained sports, yoga, and dance postures recognition: A benchmark analysis. IEEE Trans. Instrum. Meas. 72 , 1–13 (2023).
Bera, A., Wharton, Z., Liu, Y., Bessis, N. & Behera, A. Attend and guide (AG-Net): A keypoints-driven attention-based deep network for image recognition. IEEE Trans. Image Process. 30 , 3691–3704 (2021).
Article ADS PubMed Google Scholar
Singh, D. et al. PlantDoc: A dataset for visual plant disease detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD . 249–253 (ACM, 2020).
Hameed, Z., Garcia-Zapirain, B., Aguirre, J. J. & Isaza-Ruget, M. A. Multiclass classification of breast cancer histopathology images using multilevel features of deep convolutional neural network. Sci. Rep. 12 , 15600 (2022).
Shabrina, N. H. et al. A novel dataset of potato leaf disease in uncontrolled environment. Data Brief 52 , 109955 (2024).
Spanhol, F. A., Oliveira, L. S., Petitjean, C. & Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. 63 , 1455–1462 (2015).
Plissiti, M. E. et al. SIPAKMED: A new dataset for feature and image based classification of normal and pathological cervical cells in Pap smear images. In 2018 25th IEEE International Conf. Image Processing (ICIP) . 3144–3148 (IEEE, 2018).
Van Der Maaten, L. Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15 , 3221–3245 (2014).
MathSciNet Google Scholar
Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) . 618–626 (2017).
Han, K. A. M., Maneerat, N., Sepsirisuk, K. & Hamamoto, K. Banana plant nutrient deficiencies identification using deep learning. In 2023 9th International Conference on Engineering, Applied Sciences, and Technology (ICEAST) . 5–9 (IEEE, 2023).
Ahmad, A., El Gamal, A. & Saraswat, D. Toward generalization of deep learning-based plant disease identification under controlled and field conditions. IEEE Access 11 , 9042–9057 (2023).
Abdallah, N. et al. Enhancing histopathological image classification of invasive ductal carcinoma using hybrid harmonization techniques. Sci. Rep. 13 , 20014 (2023).
Han, Z. et al. Breast cancer multi-classification from histopathological images with structured deep learning model. Sci. Rep. 7 , 4172 (2017).
Article ADS PubMed PubMed Central Google Scholar
Basak, H., Kundu, R., Chakraborty, S. & Das, N. Cervical cytology classification using PCA and GWO enhanced deep features selection. SN Comput. Sci. 2 , 369 (2021).
Mohammed, M. A., Abdurahman, F. & Ayalew, Y. A. Single-cell conventional pap smear image classification using pre-trained deep neural network architectures. BMC Biomed. Eng. 3 , 11 (2021).
Yaman, O. & Tuncer, T. Exemplar pyramid deep feature extraction based cervical cancer image classification model using pap-smear images. Biomed. Signal Process. Control 73 , 103428 (2022).
Jiang, H. et al. Deep learning for computational cytology: A survey. Med. Image Anal. 84 , 102691 (2023).
Download references
This work is supported by the New Faculty Seed Grant (NFSG) and Cross-Disciplinary Research Framework (CDRF: C1/23/168) projects, Open Access facilities, and necessary computational infrastructure at the Birla Institute of Technology and Science (BITS) Pilani, Pilani Campus, Rajasthan, 333031, India.
Authors and affiliations.
Department of Computer Science and Information Systems, BITS Pilani, Pilani Campus, Pilani, Rajasthan, 333031, India
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, 700032, India
Debotosh Bhattacharjee
Faculty of Informatics and Management, University of Hradec Kralove, Hradec Kralove, Czech Republic
Debotosh Bhattacharjee & Ondrej Krejcar
Skoda Auto University, Na Karmeli 1457, 293 01, Mlada Boleslav, Czech Republic
Ondrej Krejcar
Malaysia Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia
You can also search for this author in PubMed Google Scholar
A. B. has played a pivotal role in this research. He has contributed in the model development, coding, generating results, and initial manuscript preparation. D.B. and O. K. both have contributed by reviewing the work meticulously, validated the model, and corrected the manuscript to enhance the clarity of the text description, and overall organization of the article. D.B. has carefully read the manuscript, and provided valuable inputs for improving the overall quality of the article.
Correspondence to Asish Bera .
Competing interests.
The authors declare no competing interests.
Publisher's note.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and permissions
Cite this article.
Bera, A., Bhattacharjee, D. & Krejcar, O. PND-Net: plant nutrition deficiency and disease classification using graph convolutional network. Sci Rep 14 , 15537 (2024). https://doi.org/10.1038/s41598-024-66543-7
Download citation
Received : 25 March 2024
Accepted : 02 July 2024
Published : 05 July 2024
DOI : https://doi.org/10.1038/s41598-024-66543-7
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
Gihp: graph convolutional neural network based interpretable pan-specific hla-peptide binding affinity prediction.
Accurately predicting the binding affinities between Human Leukocyte Antigen (HLA) molecules and peptides is a crucial step in understanding the adaptive immune response. This knowledge can have important implications for the development of effective vaccines and the design of targeted immunotherapies. Existing sequence-based methods are insufficient to capture the structure information. Besides, the current methods lack model interpretability, which hinder revealing the key binding amino acids between the two molecules. To address these limitations, we proposed an interpretable graph convolutional neural network (GCNN) based prediction method named GIHP. Considering the size differences between HLA and short peptides, GIHP represent HLA structure as amino acid-level graph while represent peptide SMILE string as atom-level graph. For interpretation, we design a novel visual explanation method, gradient weighted activation mapping (Grad-WAM), for identifying key binding residues. GIHP achieved better prediction accuracy than state-of-the-art methods across various datasets. According to current research findings, key HLA-peptide binding residues mutations directly impact immunotherapy efficacy. Therefore, we verified those highlighted key residues to see whether they can significantly distinguish immunotherapy patient groups. We have verified that the identified functional residues can successfully separate patient survival groups across breast, bladder, and pan-cancer datasets. Results demonstrate that GIHP improves the accuracy and interpretation capabilities of HLA-peptide prediction, and the findings of this study can be used to guide personalized cancer immunotherapy treatment. Codes and datasets are publicly accessible at: https://github.com/sdustSu/GIHP .
HLA also known as MHC (major histocompatibility complex) molecules, are responsible for presenting peptides derived from intracellular or extracellular proteins to T cells. It is a crucial step in understanding and predicting immune responses, such as antigen presentation and T-cell activation ( Kallingal et al., 2023 ). HLA molecules are classified into two major classes: class I and class II. Each class has different subtypes, and their binding abilities vary depending on the specific HLA subtype. For HLA class I, the open binding groove close to both ends restrict the size of the bounded peptides between 8–12 residues, whereas HLA class II incorporates peptides of length 13–25 residues ( Wang and Claesson, 2014 ). As a results, existing methods can be classified into allele-specific and pan-specific methods. Allele-specific methods focus on predicting the binding affinity between a specific HLA allele. Pan-specific methods aim to predict HLA-peptide binding in a more general way, without the need for allele-specific training data. ( Gizinski et al., 2024 ).
Allele-specific methods train separate models for each MHC allele and make predictions for individual alleles. NetMHC ( Lundegaard et al., 2008 ) is a widely used allele-specific method, which utilize machine learning algorithm to learn the relationship between peptide sequences and their binding affinities to specific MHC alleles. NetMHC 4.0 ( Andreatta and Nielsen, 2016 ) is also a sequence-based allele-specific method, which uses both BLOSUM62 and sparse encoding schemes to encode the peptide sequences into nine amino acid-binding cores. In comparison with the HLA (around 360aa in length), peptides length are much shorter, and such methods must take insertion methods to reconcile or extend the original sequence. In addition, deep learning-based methods have also been developed for MHC-peptide binding prediction. DeepMHCII ( You et al., 2022 ), which utilizes deep convolutional neural networks (CNNs) to capture complex sequence patterns and interactions between peptide and MHC class II molecules. It takes the peptide and MHC protein sequences as input and uses multiple layers of convolutional filters to extract features from the sequences. These filters scan the input sequences at different lengths, capturing both local and global patterns. The extracted features are then fed into fully connected layers to make predictions of the binding affinity. MHCAttnNet ( Venkatesh et al., 2020 ) utilizes a combination of bidirectional long short-term memory (Bi-LSTM) and attention mechanisms to capture important features and dependencies in MHC- peptide interactions. The Bi-LSTM processes the sequences in both forward and backward directions, capturing the dependencies and context in the data. The attention mechanism allows the model weight different parts of the input sequences based on their relative importance. This enables the model to focus on the most relevant regions of the peptide and MHC sequences during the prediction process. SMM-align ( Nielsen et al., 2007 ) utilizes structural and sequence-based features to predict binding affinities for MHC class I alleles. It employs a PSSM alignment algorithm to align target peptide sequences with known binders and derive binding predictions. MHC-NP ( Giguere et al., 2013 ) also incorporate structure with sequence-based features and employs a random forest regression model to make predictions. Allele-specific methods are particularly useful when the focus is on specific alleles of interest, allowing for more accurate predictions tailored to those specific alleles. However, developing and maintaining separate models for each allele requires a significant amount of experimental binding data and computational resources.
On the other hand, pan-specific methods have the advantage of predicting binding affinities not only for alleles present in the training data but also for new, unseen alleles. NetMHCpan and NetMHCIIpan ( Reynisson et al., 2020 ) are widely used pan-specific methods. They take sequence feature as input, utilizes artificial neural networks (ANNs) to learn the relationship between peptide sequences and their binding affinities to MHCs. They consider various sequence-based features, including amino acid composition, physicochemical properties, and binding motifs. In comparison with these two methods, another pan-specific method MHCflurry ( O'Donnell et al., 2018 ; O'Donnell et al., 2020 ) integrates additional information, such as peptide processing predictions and binding affinity measurements from mass spectrometry-based experiments, to enhance its predictions. Some sequence-based methods, such as BERTMHC ( Cheng et al., 2021 ), leverage the power of the BERT language model to improve their performance. The BERT language model is pre-trained on a vast corpus of text data, which enables it to capture intricate patterns and dependencies within input sequences effectively. One of the advantages of using BERT for encoding peptide sequences is its ability to capture long-range dependencies and contextual information. This is particularly important in MHC binding prediction, where specific amino acid positions within a peptide can significantly affect the binding affinity. Because structure determines the function of proteins, therefore, some methods also incorporate structure information into their predictions. MixMHCpred-2.0.1 ( Gfeller et al., 2018 ) employs a deep learning architecture capable of learning complex patterns and relationships between peptide sequences and MHC binding affinities. The model is trained on a diverse set of MHC alleles and covers a wide range of peptide lengths. This allows it to make accurate predictions for a broad range of MHC-peptide combinations. NetMHCpan-4.0 ( Jurtz et al., 2017 ) utilizes a combination of structural and sequence-based features. It incorporates information from MHC-peptide complex structures and uses a machine learning approach to make pan-specific predictions. RPEMHC ( Wang et al., 2024 ) is a deep learning approach that aims to improve the prediction of MHC-peptide binding affinity by utilizing a residue-residue pair encoding scheme. In RPEMHC, the peptide sequence and MHC binding groove are encoded as one-hot vectors, representing each amino acid residue and its position. AutoDock is a widely used molecular docking software that can be employed for MHC-peptide binding prediction. It uses a Lamarckian genetic algorithm to explore the conformational space and predict the binding modes and affinities of peptides within the MHC binding groove. By modelling the docking between the HLA protein and peptide ligands these methods have achieved accurate binding prediction performance. However, docking methods rely on sampling different conformations of the peptide and MHC molecule to find the best binding pose. However, the conformational space of peptides and MHC molecules can be vast, and exhaustively sampling all possible conformations is computationally infeasible.
In fact, no matter allele-specific or pan-specific methods, they all can be broadly categorized into two main categories: sequence-based and structure-based methods. Sequence-based methods utilize machine learning techniques to capture the sequence motifs and physicochemical properties important for HLA-peptide binding. These methods employ various algorithms, such as support vector machines (SVMs), random forests, or ANNs, to learn the relationships between peptide sequences and binding affinities from large datasets. Sequence-based methods have the advantage of being computationally efficient and applicable to a wide range of HLA alleles and peptides. Structure-based methods leverage the three-dimensional structures of HLA molecules and peptides to predict binding affinities. Molecular docking algorithms, such as AutoDock, are commonly used to explore the conformational space and calculate binding energies. These methods require knowledge of the 3D structures of the HLA molecule and peptide, limiting their applicability to cases where experimental structures are unavailable. Recent advancements in deep learning, such as CNNs and recurrent neural networks (RNNs), have shown promise in HLA-peptide binding affinity prediction. Deep learning-based methods can effectively capture complex sequence patterns and structural features, leading to improved prediction accuracy ( Wang et al., 2023 ). These models often incorporate encoding schemes to represent peptide sequences or structural features and are trained on large datasets to learn the relationships between sequences and binding affinities. Despite notable progress, HLA-peptide binding affinity prediction still faces challenges and have some limitations. First, deep learning models are often considered as black boxes, meaning they lack interpretability. It can be challenging to understand the specific features or patterns that contribute to the model’s predictions. Interpretability is crucial in immunology research to gain insights into the molecular mechanisms underlying MHC-peptide interactions and to guide experimental studies; Second, existing methods often rely on sequence-based encoding schemes due to the limited availability of experimentally determined 3D structures for HLA-peptide complexes. While sequence information is informative, the exclusion of structural details may limit the accuracy and coverage of predictions, particularly for cases where structural features play a crucial role. Even some tools consider structure information, they seldom consider the structure features at the amino acids level. Besides, the length difference between the peptides that HLA can bind (typically around 8–15 amino acids) and the length of HLA molecules (which can be over 360 amino acids) poses a challenge in HLA-peptide binding affinity prediction. Furthermore, unlike HLAs, peptides are too short to form stable structures. All these drawbacks are not well solved by existing methods.
Considering all these limitations, we proposed GIHP, which is an interpretable GCNN-based algorithm for the prediction of peptides binding to pan HLA molecules. By representing peptide SMILE strings ( Quiros et al., 2018 ; Meng et al., 2024 ) and HLA structures as attributed graphs, GCNNs can effectively model the pairwise interactions between amino acids and capture both local and global structural features. Furthermore, GIHP has a novel visual explanation method called Grad-WAM for HLA-peptide binding affinity prediction and interpretation. By analyzing the learned representations and interactions within the graph structure, the Grad-WAM technique can identify the key residues that contribute most significantly to the HLA-peptide binding process. Comprehensive comparative evaluation results demonstrate that the GIHP achieves good performance across diverse benchmark datasets. By applying the GIHP framework to several cancer immunotherapy datasets, we have identified numerous promising biomarkers that can effectively distinguish patients with and without treatment response. Moving forward, the insights gained from the GIHP analysis can be leveraged to guide the development of more personalized cancer immunotherapy strategies.
2.1 data collection and processing.
We collected human HLA-peptide interaction datasets from published papers or publicly available databases. ( Table 1 ).
Table 1 . Summary of the collected datasets after preprocessing.
Wang-2008 Dataset ( Wang et al., 2008 ): Experimentally measured peptide binding affinities for HLA class II molecules. The processed data set had 24,295 interaction entries in total with ligand length ranging from 16 to 37 and have 26 unique HLA molecules. HLA DP and DQ molecules are covered.
Wang-2010 Dataset ( Wang et al., 2010 ): Experimentally measured peptide binding affinities for MHC class II molecules. After preprocessing, the dataset contains 9,478 measured affinities and covers 14 MHC class II alleles with peptides length ranging from 9 to 37.
Kim-2014 Dataset ( Kim et al., 2014 ): this dataset was obtained from the Immune Epitope Database (IEDB) ( Vita et al., 2019 ), including binding affinity data compiled in 2009 (BD 2009), 2013 (BD 2013) and also include a blind datasets. Blind datasets refer to data resulting after subtracting BD2009 from BD 2013. For all these three datasets, only human datasets were kept for training. After preprocessing the dataset contains 268,189 interactions in total, with peptides length ranging from 8 to 30.
Jurtz-2017 Dataset ( Jurtz et al., 2017 ): this dataset is originally designed for training of NetMHCPan-4.0. The final processed dataset has 3,618,591 entries in total with ligand length ranging from 8 to 18.
Jensen-2018 Dataset ( Jensen et al., 2018 ): this dataset is used for training of NetMHCIIpan-3.2 ( Karosiene et al., 2013 ), which contains HLA class II binding affinities retrieved from the IEDB in 2016. The 2016 data set contains 131,008 data points, covering 36 HLA-DR, 27 HLA-DQ, 9 HLA-DP molecules and 15,965 unique peptides. The peptides length range from 9 to 33.
Zhao-2018 Dataset ( Zhao and Sher, 2018 ): this dataset is compiled for training IEDB tools as well as the MHCflurry ( O'Donnell et al., 2018 ). The dataset contains 21,092 binding relations, covering 18 HLA-DR, 19 HLA-DQ, 16 HLA-DP molecules and 2,168 unique peptides. The peptides length is 15.
Reynisson-2020 dataset ( Reynisson et al., 2020 ): this dataset is originally collected for training NetMHCpan-4.1 and NetMHCIIpan-4.0 methods. The dataset covering 161 distinct HLA class I molecules, 4,523,148 distinct peptides, with peptides length ranging from 8 to 15.
For all the collected training datasets, only binding affinity values in IC50nM format are kept, which are log-transformed to fall in the range between 0 and 1 by applying 1−log (IC50 nM)/log (50k) as explained by Nielsen et al. (2003) . When classifying the peptides into binders or non-binders a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders. We consolidated all the collected datasets, removing any duplicate entries, to arrive at a final integrated dataset comprising 160,253 unique HLA-peptide interactions, covering 223 distinct HLA alleles and 35,481 peptide sequences. To further verify the generality of our method, we collected protein-peptide binding data from pepBDB ( Wen et al., 2019 ) database, after deleting peptides short than 8aa, we got 12,655 interactions between 11,055 proteins and 7,811 peptides. Because our method takes HLA and protein structure as input, all the structure data are downloaded from the PDB ( Berman et al., 2000 ) and AlphaFold database ( Varadi et al., 2022 ) and some are predicted by alphafold2 ( Jumper et al., 2021 ) and Rosettafold ( Baek et al., 2021 ). Only high-resolution experimental structures (e.g., X-ray crystallography or cryo-EM data with resolution better than 3.0 Å) were included. All structural models, whether experimental or predicted, were subjected to validation using atomic contact evaluation, and overall model quality assessment. Only structures that passed these validation checks were retained for further analyses.
To evaluate whether the key binding residues identified by our method can effectively differentiate patients who benefit from immunotherapy, we collected relevant breast, bladder, and pan-cancer treatment datasets from the cBioPortal resource ( Cerami et al., 2012 ), as shown in Table 2 . Key binding residues mutation could lead to binding affinity change between HLA and peptides. Binding affinity change has been demonstrated as a biomarker of immunotherapy efficiency ( Kim et al., 2020 ; Seidel et al., 2021 ; Murata et al., 2022 ). For each patient, only SNP mutations are kept, if the SNP locates on the key binding site of HLA or peptide, then we separate them in one group, otherwise in the other group. Then we conduct survival analysis for the two groups.
Table 2 . Immunotherapy related dataset and three cancer datasets.
Samstein-2019 dataset ( Samstein et al., 2019 ): The cohort consisted of 1,662 patients, received at least one dose of immune checkpoint inhibitor (ICI) therapy. The cohort encompassed a variety of cancer types with an adequate number of patients for analysis. In detail, 146 patients received anti-CTLA4, 1,447 received anti-PD1 or PD-L1, and 189 received both. This is a pan-cancer dataset, including 350 cases of non-small cell lung cancer (NSCLC), 321 cases of melanoma, 151 cases of renal cell carcinoma (RCC), 214 cases of bladder cancer, and 138 cases of head and neck squamous cell cancer.
Miao-2018 dataset ( Miao et al., 2018 ): this dataset consists of 249 patient tumors from six different cancer types: melanoma ( N = 151), non-small cell lung cancer ( N = 57), bladder cancer ( N = 27), head and neck squamous cell carcinoma ( N = 12), anal cancer ( N = 1), and sarcoma ( N = 1). These patients were treated with anti-PD-1 therapy ( N = 74), anti-PD-L1 therapy ( N = 20), anti-CTLA-4 therapy ( N = 145), or a combination of anti-CTLA-4 and anti-PD-1/L1 therapies ( N = 10). A small proportion of patients ( N = 7) received a combination of anti-PD-1, anti-PD-L1, or anti-CTLA-4 therapy with another immunotherapy, targeted therapy, or cytotoxic chemotherapy.
Razavi-2018 dataset ( Razavi et al., 2018 ): This dataset is downloaded from cBioPortal: https://cbioportal-datahub.s3.amazonaws.com/breast_msk_2018.tar.gz .
Clinton-2022 dataset ( Clinton et al., 2022 ): This dataset is downloaded from cBioPortal: https://cbioportal-datahub.s3.amazonaws.com/paired_bladder_2022.tar.gz .
Aaltonen-2020 dataset ( Consortium et al., 2020 ): This dataset is downloaded from cBioPortal: https://cbioportal-datahub.s3.amazonaws.com/pancan_pcawg_2020.tar.gz .
The overall framework of GIHP is illustrated in Figure 1 . GIHP takes HLA structure and peptide SMILE string as input. In the input representation module, HLA is represented as an attributed residue-level graph, while the peptide is represented as an attributed atom-level graph. Then a multi-layer GCNNs is used to learn the high-level features, and the learned features are contacted and fed into the MLP layer for final binding affinity prediction. To enhance the results interpretability, we introduced a novel visual interpretation method called Grad-WAM. Grad-WAM leverages gradient information from the last GCN layer to assess the significance of each neuron in determining affinity.
Figure 1 . The overall framework of GIHP.
Graph-based protein structure representation has inherent advantages over traditional sequence-based approaches in capturing true binding events. For each HLA molecular, we take both structure and sequence information into consideration. Given one of our key objectives is to identify the critical binding amino acid residues, we have represented the HLA proteins as residue-level relational graphs G H = v , ε , where v is the set of amino acids, ε is the set of edges. As shown in Table 3 , we describe the node attributes by integrating sequence and structural property, including amino acid type, chemical properties, charges, etc. , while the edge attributes encompass connection types, distances, and structural information. We consider four types of bond edges including Peptide Bonds, Hydrogen Bonds, Ionic Bonds and Disulfide Bridges.
Table 3 . The node features of HLA graph.
Considering that the length of peptides binding to MHC class II is between 13–25 residues, and the length is around nine for peptides binding to MHC class I. Therefore, the peptide length is relatively short compared to HLAs (over 360aa). In this study, we represent peptides as SMILES-like sequences and then transform them into graphs using a molecular graph representation method inspired by RDKit ( https://www.rdkit.org ). The attributes of each node v i are shown in Table 4 . e i j ∈ ε is covalent bonds between the ith and the jth atoms. The edge attributes depending on the electrons shared between atoms, resulting in single, double, or triple bonds, respectively.
Table 4 . Node features of peptide graph.
Let A be the adjacency matrix, and X be the feature matrix of the given graph. Each GCN layer takes A and node embeddings as input and outputs the final embeddings. As shown in Eqs 1 , 2 .
Where, H is the embeddings, and H 0 = X , W l + 1 are trainable weight matrix, D ^ is the diagonal node degree matrix of A .
After obtaining the vector representations of HLA and peptide, they are concatenated and fed into a Multi-Layer Perceptron (MLP) to predict the binding affinity score. The MLP consists of three linear transformation layers, each followed by a Rectified Linear Unit (ReLU) activation function and a dropout layer with a dropout rate of 0.1, as in ( Öztürk et al., 2019 ). The Mean Squared Error (MSE) is employed as the loss function to measure the discrepancy between predicted and actual affinity scores. MSE is defined in Eq. 3 .
Where, n is the sample size, P i and Y i are the predictive and true values of the ith interaction pair, respectively.
While Grad-CAM has been successfully applied to various computer vision tasks, it is not directly applicable to graph-structured data. Therefore, in this paper we proposed a novel results interpretation methods called Grad-WAM, which can be used for identifying key binding related residues. Grad-WAM measure the contribution of each residue for the decision of binding by taking use of the gradient information in the last GCN layer. Grad-WAM utilizes a weighted combination of the positive partial derivatives of the feature maps with respect to the interaction values to generate the corresponding visual explanations. Considering the contribution of each residue is not equal, different from the explanation method proposed in MGraphDTA ( Yang et al., 2022 ), we introduce an additional weight ω (Eq. 4 ) gradient values.
Where, R e L U is the activation function, P is the predictive value as in Eq. 5 . T i is the feature value of the ith node on the feature map T of the last GCN layer. α i is the gradient value of the ith node defined in Eq. 6 . ∂ P ∂ T i is the partial derivative as in Eq. 7 .
In this way, the contribution of residues to the prediction of binding affinity is calculated. For visual explanation, residues are display utilizes colors, ranging from blue to red. A higher gradient value corresponds to a redder color, indicating the key role of that amino acid in the interaction.
Four widely used performance metrics were employed to measure methods’ performance. Including accuracy ( Acc ), Matthews Correlation Coefficient ( MCC ), sensitivity ( Sn ), and the specificity ( Sp ). The definitions of these four metrics are as follows: Eqs 8 – 11 .
Where, TP is True Positives, TN is True Negatives, FP is False Positives, and FN is False Negatives. In addition, by comparing the predicted and true values, predictions were assessed to be true or false. The receiver operating characteristic curves (ROC) were generated for all the methods, and the performance of each algorithm to discriminate between binders and nonbinders was analyzed by calculating the area under the ROC curve (AUC) as an estimate of prediction performance.
We compare GIHP with state-of-the-art allele and pan-specific baselines including NetMHC-4.0 ( Andreatta and Nielsen, 2016 ), NetMHCpan-4.0 ( Jurtz et al., 2017 ), PickPocket-1.1 ( Zhang et al., 2009 ), SMMPMBEC ( Kim et al., 2009 ), MHCFlurry ( O'Donnell et al., 2018 ), MixMHCpred-2.0 ( Bassani-Sternberg et al., 2017 ) and NetMHCcons-1.1 ( Karosiene et al., 2012 ). To eliminate the impact of data variations, all models were retrained and tested using our new collected and processed dataset. 10-fold cross-validation (CV) was applied. The data set is divided into 10 folds. During each iteration, one of the 10 partitions is designated as the validation dataset, while the remaining nine partitions are utilized to train the model. The final performance is determined by calculating the average performance across all 10 individual iterations. As shown in Figure 2 , on average, GIHP outperform all the compared prediction methods. It is worth noting that not every method is suitable for every HLA and peptide length. To make the performance comparison fairer and more reasonable, we train allele-specific models with their required HLAs and peptide length, which included in our datasets.
Figure 2 . Performance comparison results.
To make comparisons more comparable and test methods performance on other protein-peptide binding datasets, a separate independent test is conducted using the data collected from pepBDB, which have no overlap with the above training data. This independent test data set serves as an unbiased validation source to assess the performance of different tools, which is relatively more objective, and can test models’ generalization ability. 10-fold cross validation is applied, after each epoch average results are calculated. Results on the pepBDB independent test data is shown in Figure 3 .
Figure 3 . Independent test results on pepBDB datasets.
On average, GIHP achieved highest AUC value. In this independent test data, GIHP achieved the highest AUC of 0.88 and the highest Sp score of 0.98. In contrast, NetMHCPan-4.0 and Pickpocket-1.1 attained AUC values of 0.76 or lower, and Acc scores of 0.71 or lower when evaluated on this new dataset. Difference from the results on the above part, MHCflurry got AUC up to 0.8. Similar with our method, MHCflurry harness the power of deep learning and a comprehensive dataset to improve the prediction of HLA-peptide binding affinities. Our model outperforms both allele and pan-specific methods, demonstrate its ability to achieve higher prediction accuracy and robustness generality for all kinds of training data.
For evaluating the performance our method under different peptide length. We collected independent test set and external test set from TransPHLA, which can be downloaded from https://github.com/a96123155/TransPHLA-AOMP/ tree/master/Dataset. In the collected datasets, 9-mer peptides comprising the largest proportion, while the number of 13-mer and 14-mer peptides is very small. Our model’s performance on the independent test set and external test set for different peptide lengths are shown in Figures 4A, B respectively. As shown in Figure 4 , our methods can achieve good performance on all kinds of peptide length.
Figure 4 . The performance of our model on the independent test set and external test set for the different peptide lengths. (A) Performance on the independent test set. (B) Performance on the external tet set.
The binding of peptides to HLA molecules occurs within specialized regions called binding pockets. HLA class I molecules have a peptide-binding groove formed by two alpha helices (α1 and α2) and a beta sheet platform. Within this groove, there are seven pockets (numbered from A to F, shown in Figure 5A ) that interact with specific amino acid residues of the bound peptide. HLA class II molecules are involved in presenting peptides derived from extracellular proteins to helper T cells. HLA class II binding pockets are formed by two chains: the alpha chain (α) and the beta chain (β). Each chain consists of two domains: the α1 and β1 domains form the peptide-binding groove, while the α2 and β2 domains provide structural support. The binding groove of HLA class II molecules is open at both ends, allowing longer peptides to bind compared to HLA class I molecules. The binding pockets in HLA class II molecules are referred to as P1, P4, P6, P7, P9 ( Figure 5B ). With our GIHP results interpret module, many key binding residues on both HLA class molecules and the corresponding peptides are identified. Although some residues with high activity scores locates outside of binding pockets, most of them locates on one of the binding pockets. As shown in Figures 5C,D , 45 residues with highest activity scores on HLAs are identified, among them 26 locates on HLA class I pockets, and 19 locates on HLA class II pockets.
Figure 5 . The key binding residues on HLA pockets and HLA binding peptides motif. (A) Binding pockets on HLA class I molecules. (B) Binding pockets on HLA class II molecules. (C) The identified key binding residue locations and activity scores on each pocket of HLA class I molecules, where R represent residue location analyzed HLA molecules. (D) The identified key binding residue locations and activity scores on each pocket of HLA class II molecules. (E) Distribution of preferred peptide residues of HLA class I molecules using Seq2logo2.0. (F) Distribution of preferred peptide residues of HLA class I molecules using Seq2logo2.0.
Position 159 has the highest activity score on pockets A. Other positions including 59, 171, 167, seven and 66. According to current research, position seven is a pocket A’s floor residue. This residue creates a hydrophobic environment within the pocket A and interact with the side chain of the anchor residue. Although residue on position 159 has no evidence of directly involved in peptide binding interactions, it has structural and functional implications for the overall stability and conformation of the pocket A region ( Ma et al., 2020 ). It potentially contributes to the shape and electrostatic properties of the pocket, indirectly affecting the binding preferences and stability of the peptides presented by the HLA class I molecule. However, the specific role and impact of residue 159 on the pocket A’s function vary among different HLA alleles and need further study for a comprehensive understanding. On pockets B, substitutions at positions 70 was found to yield a significantly distinct peptide-binding repertoire in HLA-A molecules when compared to HLA-B molecules. Positions 167 and position 67 on pocket B has been demonstrated as key peptide-binding residues. Besides, substitutions at positions 67 and nine exert a significant influence on the peptide-binding repertoire ( van Deutekom and Keşmir, 2015 ). Position 97 has the highest activity score on pockets C. Position 97 is known to be a critical residue for peptide binding and presentation. This residue locates near the C-terminal anchor residue of the bound peptide and contributes to the formation of the peptide-binding groove. The amino acid at position 97 can significantly influence the peptide-binding specificity and affinity of the HLA molecule. Substitutions or variations at this position can alter the size, shape, or electrostatic properties of the pocket C, thereby affecting the recognition and binding of specific peptides. Several studies have investigated the impact of position 97 on peptide binding and immunological responses ( Moutaftsi et al., 2006 ).
Considering the residues with high activity scores on HLA class II pockets, position nine is crucial for determining the peptide-binding specificity of the HLA class II molecule. The amino acid at position nine of the bound peptide interacts with residues in the P1 pocket, influencing the peptide-binding preferences. Position 86 plays a critical role in peptide binding and presentation ( Brown et al., 1993 ). The amino acid at position 86 interacts with the peptide residue and contributes to the stability and specificity of the HLA-peptide class II complex ( Stern et al., 1994 ). Among our identified important positions, positions 13 and 74 are critical for determining the peptide-binding specificity and stability of HLA class II molecules. The interactions between peptide residues and the residues in these pockets are essential for the recognition and presentation of antigenic peptides to CD4 + T cells. Except these positions, we also prioritized many other residues, such as positions 63 and 57. These positions within the peptide-binding grooves of HLA class II molecules is crucial for understanding the molecular basis of antigen presentation and immune responses. Researchers can gain valuable information about the molecular interactions governing antigen presentation and T cell recognition. Furthermore, these results can help designing personalized immunotherapies ( Boukouaci et al., 2024 ).
Figures 5E, F show the motif analysis results. In the two figures, the Y-axis describes the amount of information in bits. The X-axis shows the position in the alignment. At each position there is a stack of symbols representing the amino acid. Large symbols represent frequently observed amino acids, big stacks represent conserved positions and small stacks represents variable positions. Therefore, positions 2, 4 and nine have frequently observed amino acids in HLA class I and class II respectively.
In this paper, we focus on finding immunotherapy efficiency related key residues and their corresponding genes. With the identified residue positions and the corresponding gene mutation, we try to verify whether they can be biomarkers to separate patients into different survival groups. We applied GIHP to immunotherapy related datasets (Samstein-2019 and Miao-2018 in Table 2 ). For each SNP mutation site, we extract the corresponding 9-mer peptide around it and predict the binding affinities with all the 223 HLAs. By paired t -test statistical comparing the binding affinity change before and after residue substitution, along with GIHP returned activity scores of each residue, significant key binding residues are identified. To get the functions of these mutation related genes, we conducted GO enrichment analysis by ShinyGO-0.80 ( Ge et al., 2020 ). As shown in Figure 6 , most of key residues locate on genes related to pathways in cancer and cancer related signaling pathways.
Figure 6 . GO enrichment results of key residues related genes.
Since we interest in finding mutations related to immunotherapy response, therefore, we further analyzed key residues enriched in T cell receptor signaling pathway ( Figure 6 ). The enriched genes include RHOA, HLA-B, HRAS, IL10, NRAS and KRAS. RHOA has been implicated in T cell activation and migration, which are critical for effective anti-tumor immune responses ( Bros et al., 2019 ). Altered RHOA signaling could potentially impact T cell function and infiltration into the tumor microenvironment, influencing immunotherapy response. HLA-B plays a crucial role in immune recognition, as it presents peptide antigens derived from intracellular proteins to cytotoxic T cells. HRAS, NRAS, and KRAS are genes that belong to the RAS family of oncogenes. These genes encode proteins involved in intracellular signaling pathways regulating cell growth, survival, and proliferation. The presence of RAS mutations has been associated with poorer response rates to certain immunotherapies, including immune checkpoint inhibitors ( East et al., 2022 ). IL10 can suppress the activity of cytotoxic T cells and natural killer (NK) cells, which are critical for tumor surveillance and elimination. High levels of IL10 in the tumor microenvironment have been associated with immunosuppression and reduced response to immunotherapy ( Salkeni and Naing, 2023 ).
Next, we investigated the impact of biomarker gene mutations on patient survival outcomes using a cohort of individuals (Samstein-2019 dataset in Table 2 ) with immunotherapy treatment. The patients were categorized into two groups based on the presence or absence of the biomarker gene mutation. Kaplan-Meier survival curves were generated, and a log-rank test was performed to compare the survival between the two groups. The results revealed a significant difference in survival between the two groups, with patients harboring the biomarker gene mutation exhibiting a higher risk of adverse events compared to those without the mutation. These findings highlight the potential prognostic significance of the biomarker gene mutation and underscore its relevance in patient stratification and personalized treatment approaches. Furthermore, we compared our results with TMB score provided in ( Samstein et al., 2019 ). As shown in Figure 7 , patients with biomarker mutations tend to have poor survival status.
Figure 7 . Results on immunotherapy data. (A) patient groups separated by GIHP identified biomarker mutations. (B) TMB separated patient groups.
As shown in Figure 7 , our methods can separate patients more significantly. Although TMB can separate patients, TMB is an overall measure, its hard to know which gene mutations play key roles in differentiating patients’ response. Our methods not only can separate patients significantly, moreover, we also know which residue substitutions play key roles. To further test the performance of these biomarker genes, we analyzed Miao-2018 datasets ( Table 2 ), results is show in Figure 8 .
Figure 8 . Results on Miao-2018 datasets.
As illustrated in Figure 8 , the identified biomarker mutations are also able to effectively separate patient groups with statistical significance. Our findings provide compelling evidence that the identified biomarker genes may possess valuable predictive power for immunotherapy response and patient survival outcomes. This highlights their potential as clinically relevant targets for the development of personalized treatment approaches. The results of this study advance the understanding of the underlying molecular mechanisms governing immunotherapy efficacy, and offer promising directions for future research and therapeutic interventions.
In this section, we test whether these key residue mutations and their corresponding genes can separate other cancer patients. Results are shown in Figures 9A–C . Detail information of these three cancer datasets are shown in Table 2 . We can see that our biomarker genes can differentiate the three-cancer type significantly. Especially for the pan cancer datasets.
Figure 9 . Survival curves on breast, bladder and pan cancer datasets.
In summary, we proposed a new GCNN-based framework called GIHP for pan-specific HLA-peptide binding affinity prediction. GIHP harness both structure and sequence information and utilized Grad-WAM for visual interpretation. Extensive comparison with state-of-the-art methods verified the better performance of our methods. Collectively, the findings provide evidence that the GIHP framework has improved the generalization and interpretability capabilities of HLA-peptide binding prediction models. Furthermore, we have identified numerous key binding-related amino acid residues that can serve as potential biomarkers for differentiating patient groups based on immunotherapy response. When applying these identified biomarkers on datasets from other cancer types, they were also able to effectively differentiate patient groups with statistical significance. These findings highlight the potential prognostic significance of the biomarker gene mutation and underscore its relevance in patient stratification and personalized immunotherapy treatment approaches.
The data presented in the study are deposited in the Github, accession link: https://github.com/sdustSu/GIHP .
LS: Funding acquisition, Methodology, Writing–original draft, Writing–review and editing. YY: Formal Analysis, Methodology, Validation, Visualization, Writing–review and editing. BM: Data curation, Formal Analysis, Investigation, Writing–review and editing. SZ: Formal Analysis, Methodology, Resources, Visualization, Writing–review and editing. ZC: Conceptualization, Project administration, Resources, Supervision, Writing–original draft, Writing–review and editing.
The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work is supported by Natural Science Foundation of Shandong Province (Youth Program, Grant No. ZR2022QF136), the Elite Program of Shandong University of Science and Technology and the National Science Foundation of China (Grant No. 62302277).
Author YY was employed by Shandong Guohe Industrial Technology Research Institute Co. Ltd. BM was employed by Qingdao UNIC Information Technology Co. Ltd.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Andreatta, M., and Nielsen, M. (2016). Gapped sequence alignment using artificial neural networks: application to the MHC class I system. Bioinformatics 32, 511–517. doi:10.1093/bioinformatics/btv639
PubMed Abstract | CrossRef Full Text | Google Scholar
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov, S., Lee, G. R., et al. (2021). Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876. doi:10.1126/science.abj8754
Bassani-Sternberg, M., Chong, C., Guillaume, P., Solleder, M., Pak, H., Gannon, P. O., et al. (2017). Deciphering HLA-I motifs across HLA peptidomes improves neo-antigen predictions and identifies allostery regulating HLA specificity. PLoS Comput. Biol. 13, e1005725. doi:10.1371/journal.pcbi.1005725
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., et al. (2000). The protein data bank. Nucleic Acids Res. 28, 235–242. doi:10.1093/nar/28.1.235
Boukouaci, W., Rivera-Franco, M. M., Volt, F., Lajnef, M., Wu, C. L., Rafii, H., et al. (2024). HLA peptide-binding pocket diversity modulates immunological complications after cord blood transplant in acute leukaemia. Br. J. Haematol. 204, 1920–1934. doi:10.1111/bjh.19339
Bros, M., Haas, K., Moll, L., and Grabbe, S. (2019). RhoA as a key regulator of innate and adaptive immunity. Cells 8, 733. doi:10.3390/cells8070733
Brown, J. H., Jardetzky, T. S., Gorga, J. C., Stern, L. J., Urban, R. G., Strominger, J. L., et al. (1993). Three-dimensional structure of the human class II histocompatibility antigen HLA-DR1. Nature 364, 33–39. doi:10.1038/364033a0
Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E., Sumer, S. O., Aksoy, B. A., et al. (2012). The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404. doi:10.1158/2159-8290.CD-12-0095
Cheng, J., Bendjama, K., Rittner, K., and Malone, B. (2021). BERTMHC: improved MHC-peptide class II interaction prediction with transformer and multiple instance learning. Bioinformatics 37, 4172–4179. doi:10.1093/bioinformatics/btab422
Clinton, T. N., Chen, Z., Wise, H., Lenis, A. T., Chavan, S., Donoghue, M. T. A., et al. (2022). Genomic heterogeneity as a barrier to precision oncology in urothelial cancer. Cell Rep. 41, 111859. doi:10.1016/j.celrep.2022.111859
Consortium, I. T. P., Abascal, F., Abeshouse, A., Aburatani, H., Adams, D. J., Agrawal, N., et al. (2020). Pan-cancer analysis of whole genomes. Nature 578, 82–93. doi:10.1038/s41586-020-1969-6
East, P., Kelly, G. P., Biswas, D., Marani, M., Hancock, D. C., Creasy, T., et al. (2022). RAS oncogenic activity predicts response to chemotherapy and outcome in lung adenocarcinoma. Nat. Commun. 13, 5632. doi:10.1038/s41467-022-33290-0
Ge, S. X., Jung, D., and Yao, R. (2020). ShinyGO: a graphical gene-set enrichment tool for animals and plants. Bioinformatics 36, 2628–2629. doi:10.1093/bioinformatics/btz931
Gfeller, D., Guillaume, P., Michaux, J., Pak, H. S., Daniel, R. T., Racle, J., et al. (2018). The length distribution and multiple specificity of naturally presented HLA-I ligands. J. Immunol. 201, 3705–3716. doi:10.4049/jimmunol.1800914
Giguere, S., Drouin, A., Lacoste, A., Marchand, M., Corbeil, J., and Laviolette, F. (2013). MHC-NP: predicting peptides naturally processed by the MHC. J. Immunol. Methods 400-401, 30–36. doi:10.1016/j.jim.2013.10.003
Gizinski, S., Preibisch, G., Kucharski, P., Tyrolski, M., Rembalski, M., Grzegorczyk, P., et al. (2024). Enhancing antigenic peptide discovery: improved MHC-I binding prediction and methodology. Methods 224, 1–9. doi:10.1016/j.ymeth.2024.01.016
Jensen, K. K., Andreatta, M., Marcatili, P., Buus, S., Greenbaum, J. A., Yan, Z., et al. (2018). Improved methods for predicting peptide binding affinity to MHC class II molecules. Immunology 154, 394–406. doi:10.1111/imm.12889
Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589. doi:10.1038/s41586-021-03819-2
Jurtz, V., Paul, S., Andreatta, M., Marcatili, P., Peters, B., and Nielsen, M. (2017). NetMHCpan-4.0: improved peptide-MHC class I interaction predictions integrating eluted ligand and peptide binding affinity data. J. Immunol. 199, 3360–3368. doi:10.4049/jimmunol.1700893
Kallingal, A., Olszewski, M., Maciejewska, N., Brankiewicz, W., and Baginski, M. (2023). Cancer immune escape: the role of antigen presentation machinery. J. Cancer Res. Clin. Oncol. 149, 8131–8141. doi:10.1007/s00432-023-04737-8
Karosiene, E., Lundegaard, C., Lund, O., and Nielsen, M. (2012). NetMHCcons: a consensus method for the major histocompatibility complex class I predictions. Immunogenetics 64, 177–186. doi:10.1007/s00251-011-0579-8
Karosiene, E., Rasmussen, M., Blicher, T., Lund, O., Buus, S., and Nielsen, M. (2013). NetMHCIIpan-3.0, a common pan-specific MHC class II prediction method including all three human MHC class II isotypes, HLA-DR, HLA-DP and HLA-DQ. Immunogenetics 65, 711–724. doi:10.1007/s00251-013-0720-y
Kim, K., Kim, H. S., Kim, J. Y., Jung, H., Sun, J. M., Ahn, J. S., et al. (2020). Predicting clinical benefit of immunotherapy by antigenic or functional mutations affecting tumour immunogenicity. Nat. Commun. 11, 951. doi:10.1038/s41467-020-14562-z
Kim, Y., Sidney, J., Buus, S., Sette, A., Nielsen, M., and Peters, B. (2014). Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions. BMC Bioinforma. 15, 241. doi:10.1186/1471-2105-15-241
Kim, Y., Sidney, J., Pinilla, C., Sette, A., and Peters, B. (2009). Derivation of an amino acid similarity matrix for peptide: MHC binding and its application as a Bayesian prior. BMC Bioinforma. 10, 394. doi:10.1186/1471-2105-10-394
Lundegaard, C., Lamberth, K., Harndahl, M., Buus, S., Lund, O., and Nielsen, M. (2008). NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8-11. Nucleic Acids Res. 36, W509–W512. doi:10.1093/nar/gkn202
Ma, L., Zhang, N., Qu, Z., Liang, R., Zhang, L., Zhang, B., et al. (2020). A glimpse of the peptide profile presentation by Xenopus laevis MHC class I: crystal structure of pXela-UAA reveals a distinct peptide-binding groove. J. Immunol. 204, 147–158. doi:10.4049/jimmunol.1900865
Meng, Z., Chen, C., Zhang, X., Zhao, W., and Cui, X. (2024). Exploring fragment adding strategies to enhance molecule pretraining in AI-driven drug discovery. Big Data Min. Anal. , 1–12. doi:10.26599/bdma.2024.9020003
CrossRef Full Text | Google Scholar
Miao, D., Margolis, C. A., Vokes, N. I., Liu, D., Taylor-Weiner, A., Wankowicz, S. M., et al. (2018). Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors. Nat. Genet. 50, 1271–1281. doi:10.1038/s41588-018-0200-2
Moutaftsi, M., Peters, B., Pasquetto, V., Tscharke, D. C., Sidney, J., Bui, H. H., et al. (2006). A consensus epitope prediction approach identifies the breadth of murine T(CD8+)-cell responses to vaccinia virus. Nat. Biotechnol. 24, 817–819. doi:10.1038/nbt1215
Murata, K., Ly, D., Saijo, H., Matsunaga, Y., Sugata, K., Ihara, F., et al. (2022). Modification of the HLA-A*24:02 peptide binding pocket enhances cognate peptide-binding capacity and antigen-specific T cell activation. J. Immunol. 209, 1481–1491. doi:10.4049/jimmunol.2200305
Nielsen, M., Lundegaard, C., and Lund, O. (2007). Prediction of MHC class II binding affinity using SMM-align, a novel stabilization matrix alignment method. BMC Bioinforma. 8, 238. doi:10.1186/1471-2105-8-238
Nielsen, M., Lundegaard, C., Worning, P., Lauemoller, S. L., Lamberth, K., Buus, S., et al. (2003). Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci. 12, 1007–1017. doi:10.1110/ps.0239403
O’Donnell, T. J., Rubinsteyn, A., Bonsack, M., Riemer, A. B., Laserson, U., and Hammerbacher, J. (2018). MHCflurry: open-source class I MHC binding affinity prediction. Cell Syst. 7, 129–132. doi:10.1016/j.cels.2018.05.014
O’Donnell, T. J., Rubinsteyn, A., and Laserson, U. (2020). MHCflurry 2.0: improved pan-allele prediction of MHC class I-presented peptides by incorporating antigen processing. Cell Syst. 11, 418–419. doi:10.1016/j.cels.2020.09.001
Öztürk, H., Ozkirimli, E., and Özgür, A. (2019). WideDTA: prediction of drug-target binding affinity . arXiv:1902.04166.
Google Scholar
Quiros, M., Grazulis, S., Girdzijauskaite, S., Merkys, A., and Vaitkus, A. (2018). Using SMILES strings for the description of chemical connectivity in the Crystallography Open Database. J. Cheminform 10, 23. doi:10.1186/s13321-018-0279-6
Razavi, P., Chang, M. T., Xu, G., Bandlamudi, C., Ross, D. S., Vasan, N., et al. (2018). The genomic landscape of endocrine-resistant advanced breast cancers. Cancer Cell 34, 427–438. doi:10.1016/j.ccell.2018.08.008
Reynisson, B., Alvarez, B., Paul, S., Peters, B., and Nielsen, M. (2020). NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data. Nucleic Acids Res. 48, W449–W454. doi:10.1093/nar/gkaa379
Salkeni, M. A., and Naing, A. (2023). Interleukin-10 in cancer immunotherapy: from bench to bedside. Trends Cancer 9, 716–725. doi:10.1016/j.trecan.2023.05.003
Samstein, R. M., Lee, C. H., Shoushtari, A. N., Hellmann, M. D., Shen, R., Janjigian, Y. Y., et al. (2019). Tumor mutational load predicts survival after immunotherapy across multiple cancer types. Nat. Genet. 51, 202–206. doi:10.1038/s41588-018-0312-8
Seidel, R. D., Merazga, Z., Thapa, D. R., Soriano, J., Spaulding, E., Vakkasoglu, A. S., et al. (2021). Peptide-HLA-based immunotherapeutics platforms for direct modulation of antigen-specific T cells. Sci. Rep. 11, 19220. doi:10.1038/s41598-021-98716-z
Stern, L. J., Brown, J. H., Jardetzky, T. S., Gorga, J. C., Urban, R. G., Strominger, J. L., et al. (1994). Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide. Nature 368, 215–221. doi:10.1038/368215a0
van Deutekom, H. W. M., and Keşmir, C. (2015). Zooming into the binding groove of HLA molecules: which positions and which substitutions change peptide binding most? Immunogenetics 67, 425–436. doi:10.1007/s00251-015-0849-y
Varadi, M., Anyango, S., Deshpande, M., Nair, S., Natassia, C., Yordanova, G., et al. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444. doi:10.1093/nar/gkab1061
Venkatesh, G., Grover, A., Srinivasaraghavan, G., and Rao, S. (2020). MHCAttnNet: predicting MHC-peptide bindings for MHC alleles classes I and II using an attention-based deep neural model. Bioinformatics 36, i399–i406. doi:10.1093/bioinformatics/btaa479
Vita, R., Mahajan, S., Overton, J. A., Dhanda, S. K., Martini, S., Cantrell, J. R., et al. (2019). The immune epitope database (IEDB): 2018 update. Nucleic Acids Res. 47, D339–D343. doi:10.1093/nar/gky1006
Wang, M., and Claesson, M. H. (2014). Classification of human leukocyte antigen (HLA) supertypes. Methods Mol. Biol. 1184, 309–317. doi:10.1007/978-1-4939-1115-8_17
Wang, P., Sidney, J., Dow, C., Mothe, B., Sette, A., and Peters, B. (2008). A systematic assessment of MHC class II peptide binding predictions and evaluation of a consensus approach. PLoS Comput. Biol. 4, e1000048. doi:10.1371/journal.pcbi.1000048
Wang, P., Sidney, J., Kim, Y., Sette, A., Lund, O., Nielsen, M., et al. (2010). Peptide binding predictions for HLA DR, DP and DQ molecules. BMC Bioinforma. 11, 568. doi:10.1186/1471-2105-11-568
Wang, X., Wu, T., Jiang, Y., Chen, T., Pan, D., Jin, Z., et al. (2024). RPEMHC: improved prediction of MHC-peptide binding affinity by a deep learning approach based on residue-residue pair encoding. Bioinformatics 40, btad785. doi:10.1093/bioinformatics/btad785
Wang, Y., Jiao, Q., Wang, J., Cai, X., Zhao, W., and Cui, X. (2023). Prediction of protein-ligand binding affinity with deep learning. Comput. Struct. Biotechnol. J. 21, 5796–5806. doi:10.1016/j.csbj.2023.11.009
Wen, Z., He, J., Tao, H., and Huang, S. Y. (2019). PepBDB: a comprehensive structural database of biological peptide-protein interactions. Bioinformatics 35, 175–177. doi:10.1093/bioinformatics/bty579
Yang, Z., Zhong, W., Zhao, L., and Yu-Chian Chen, C. (2022). MGraphDTA: deep multiscale graph neural network for explainable drug-target binding affinity prediction. Chem. Sci. 13, 816–833. doi:10.1039/d1sc05180f
You, R., Qu, W., Mamitsuka, H., and Zhu, S. (2022). DeepMHCII: a novel binding core-aware deep interaction model for accurate MHC-II peptide binding affinity prediction. Bioinformatics 38, i220–i228. doi:10.1093/bioinformatics/btac225
Zhang, H., Lund, O., and Nielsen, M. (2009). The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding. Bioinformatics 25, 1293–1299. doi:10.1093/bioinformatics/btp137
Zhao, W., and Sher, X. (2018). Systematically benchmarking peptide-MHC binding predictors: from synthetic to naturally processed epitopes. PLoS Comput. Biol. 14, e1006457. doi:10.1371/journal.pcbi.1006457
Keywords: HLA-peptide binding, model interpretation, GCNN, immunotherapy, affinity prediction
Citation: Su L, Yan Y, Ma B, Zhao S and Cui Z (2024) GIHP: Graph convolutional neural network based interpretable pan-specific HLA-peptide binding affinity prediction. Front. Genet. 15:1405032. doi: 10.3389/fgene.2024.1405032
Received: 22 March 2024; Accepted: 20 June 2024; Published: 10 July 2024.
Reviewed by:
Copyright © 2024 Su, Yan, Ma, Zhao and Cui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Zhenyu Cui, [email protected]
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
IMAGES
VIDEO
COMMENTS
Examples on Graphical Representation of Data. Example 1: A pie chart is divided into 3 parts with the angles measuring as 2x, 8x, and 10x respectively. Find the value of x in degrees. Solution: We know, the sum of all angles in a pie chart would give 360º as result. ⇒ 2x + 8x + 10x = 360º. ⇒ 20 x = 360º.
Graphical Representation of Data: Graphical Representation of Data," where numbers and facts become lively pictures and colorful diagrams.Instead of staring at boring lists of numbers, we use fun charts, cool graphs, and interesting visuals to understand information better. In this exciting concept of data visualization, we'll learn about different kinds of graphs, charts, and pictures ...
Data visualization is the graphical representation of information and data. By using v isual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. Additionally, it provides an excellent way for employees or business owners to present data to non ...
General Rules for Graphical Representation of Data. There are certain rules to effectively present the information in the graphical representation. They are: Suitable Title: Make sure that the appropriate title is given to the graph which indicates the subject of the presentation. Measurement Unit: Mention the measurement unit in the graph.
Introduction to Graphs-PDF. The graph is nothing but an organized representation of data. It helps us to understand the data. ... The main disadvantage of graphical representation of data is that it takes a lot of effort as well as resources to find the most appropriate data and then represents it graphically.
2.3: Histograms, Frequency Polygons, and Time Series Graphs. A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond ...
Then patterns can more easily be discerned. Figure 2.1.1 2.1. 1: When you have large amounts of data, you will need to organize it in a way that makes sense. These ballots from an election are rolled together with similar ballots to keep them organized. (credit: William Greeson) In this chapter, you will study graphical ways to describe and ...
Data visualization is the representation of information and data using charts, graphs, maps, and other visual tools. These visualizations allow us to easily understand any patterns, trends, or outliers in a data set. Data visualization also presents data to the general public or specific audiences without technical knowledge in an accessible ...
Data visualization is the graphical representation of data for understanding and communication. This encompasses two primary classes of visualization: Information Visualization - Visualization of data. This can either be: Exploratory: You are trying to explore and understand patterns and trends within your data. Explanatory: There is something in your data you would like to communicate to your ...
A circle graph (or pie chart) is a circle that is divided into as many sections as there are categories of the qualitative variable. The area of each section represents, for each category, the value of the quantitative data as a fraction of the sum of values. The fractions sum to 1 . Sometimes the section labels include both the category ...
Graphical representation of data, often referred to as graphical presentation or simply graphs which plays a crucial role in conveying information effectively. Principles of Graphical Representation. Effective graphical representation follows certain fundamental principles that ensure clarity, accuracy, and usability:Clarity : The primary goal ...
Chapter 11. Data visualization principles. We have already provided some rules to follow as we created plots for our examples. Here, we aim to provide some general principles we can use as a guide for effective data visualization. Much of this section is based on a talk by Karl Broman 30 titled "Creating Effective Figures and Tables" 31 and ...
Data and information visualization ( data viz/vis or info viz/vis) [2] is the practice of designing and creating easy-to-communicate and easy-to-understand graphic or visual representations of a large amount [3] of complex quantitative and qualitative data and information with the help of static, dynamic or interactive visual items.
Data visualization is the representation of data through use of common graphics, such as charts, plots, infographics and even animations. ... and hidden relationships within this unstructured data. Alternatively, they may utilize a graph structure to illustrate relationships between entities in a knowledge graph. There are a number of ways to ...
Data visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. This practice is crucial in the data science process, as it helps to make data more understandable ...
Chapter 9. tion, Applications, Representation9.1 Graphs and RelationsGraphs (sometimes referred to as networks) offer a way of expressing relationships between pairs of items, and are. one of the mo. t important abstractions in c. mputer science.Question 9.1. What makes graphs so special?Wha.
A graphical representation is the geometrical image of a set of data that preserves its characteristics and displays them at a glance. It is a mathematical picture of data points. It enables us to think about a statistical problem in visual terms. It is an effective tool for the preparation, understanding and interpretation of the collected data.
Table 2.3.1 2.3. 1: Data of Test Grades. Solution. Divide each number so that the tens digit is the stem and the ones digit is the leaf. 62 becomes 6|2. Make a vertical chart with the stems on the left of a vertical bar. Be sure to fill in any missing stems.
Graphical summaries of data # Many powerful approaches to data analysis communicate their findings via graphs. These are an important counterpart to data analysis approaches that communicate their findings via numbers or tabless. Here we will illustrate some of the most common approaches for graphical data analysis. Throughout this discussion, it is important to remember that graphical data ...
Learning over the whole graph is the most intuitive approach. We take a whole graph as input and generate a prediction based on it. It closely resembles standard machine learning regression and classification tasks. Lastly, for the community detection, we aim to identify dense clusters of vertices within the graph.
A Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph. More formally a Graph is composed of a set of vertices ( V ) and a set of edges ( E ). The graph is denoted by G (V, E).
This blog post covers one of my recent papers in which I introduced a new graph convolutional block, called MusGConv, designed specifically for processing music score data. MusGConv takes advantage of music perceptual principles to improve the efficiency and the performance of graph convolution in Graph Neural Networks applied to music ...
This article concerns selected issues related to the representation of process information in graphical form to develop a comprehensive User Interface. It presents XAML Domain-Specific Language as a description of the user interface. It is a contribution to Programming in Practice External Data topics. A sample program backs all topics. Preface
Equations system constructors of hierarchical circuits play a central role in device modeling, nonlinear equations solving, and circuit design automation. However, existing constructors present limitations in applications to different extents. For example, the costs of developing and reusing device models -- especially coarse-grained equivalent models of circuit modules -- remain high while ...
Three-dimensional (3D) geological modeling from limited and scattered information is essential for engineering geological investigation and design. Previous studies have encountered limitations when using a single modeling approach in complex tasks involving diverse geological structures, due to difficulties in accommodating the heterogeneity of geological structures and data imbalances. In ...
Representation of Graph Data Structure: There are two ways to store a graph: Adjacency Matrix; Adjacency List; Adjacency Matrix Representation of Graph Data Structure: In this method, the graph is stored in the form of the 2D matrix where rows and columns denote vertices. Each entry in the matrix represents the weight of the edge between those ...
In this study, we formulate the task of Video Anomaly Detection as a probabilistic analysis of object bounding boxes. We hypothesize that the representation of objects via their bounding boxes only, can be sufficient to successfully identify anomalous events in a scene. The implied value of this approach is increased object anonymization, faster model training and fewer computational resources ...
The patient health prediction system is the most critical study in medical research. Several prediction models exist to predict the patient's health condition. However, a relevant result was not attained because of poor quality. The IoT-sensed data contains more noise content, which maximizes the complexity of the health prediction. These demerits resulted in low prediction and performance ...
Over time, several variations of GCNs have been developed for graph structured data 21. Furthermore, GCN is effective for message propagation for image and video data in various applications.
Furthermore, GIHP has a novel visual explanation method called Grad-WAM for HLA-peptide binding affinity prediction and interpretation. By analyzing the learned representations and interactions within the graph structure, the Grad-WAM technique can identify the key residues that contribute most significantly to the HLA-peptide binding process.