A list of completed theses and new thesis topics from the Computer Vision Group.
Are you about to start a BSc or MSc thesis? Please read our instructions for preparing and delivering your work.
Below we list possible thesis topics for Bachelor and Master students in the areas of Computer Vision, Machine Learning, Deep Learning and Pattern Recognition. The project descriptions leave plenty of room for your own ideas. If you would like to discuss a topic in detail, please contact the supervisor listed below and Prof. Paolo Favaro to schedule a meeting. Note that for MSc students in Computer Science it is required that the official advisor is a professor in CS.
![](http://myjudaica.online/777/templates/cheerup2/res/banner1.gif)
AI deconvolution of light microscopy images
Level: master.
Background Light microscopy became an indispensable tool in life sciences research. Deconvolution is an important image processing step in improving the quality of microscopy images for removing out-of-focus light, higher resolution, and beter signal to noise ratio. Currently classical deconvolution methods, such as regularisation or blind deconvolution, are implemented in numerous commercial software packages and widely used in research. Recently AI deconvolution algorithms have been introduced and being currently actively developed, as they showed a high application potential.
Aim Adaptation of available AI algorithms for deconvolution of microscopy images. Validation of these methods against state-of-the -art commercially available deconvolution software.
Material and Methods Student will implement and further develop available AI deconvolution methods and acquire test microscopy images of different modalities. Performance of developed AI algorithms will be validated against available commercial deconvolution software.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.cvg.unibe.ch/media/uploads/davtyan/2023/10/27/image.png)
- Al algorithm development and implementation: 50%.
- Data acquisition: 10%.
- Comparison of performance: 40 %.
Requirements
- Interest in imaging.
- Solid knowledge of AI.
- Good programming skills.
Supervisors Paolo Favaro, Guillaume Witz, Yury Belyaev.
Institutes Computer Vison Group, Digital Science Lab, Microscopy imaging Center.
Contact Yury Belyaev, Microscopy imaging Center, [email protected] , + 41 78 899 0110.
Instance segmentation of cryo-ET images
Level: bachelor/master.
In the 1600s, a pioneering Dutch scientist named Antonie van Leeuwenhoek embarked on a remarkable journey that would forever transform our understanding of the natural world. Armed with a simple yet ingenious invention, the light microscope, he delved into uncharted territory, peering through its lens to reveal the hidden wonders of microscopic structures. Fast forward to today, where cryo-electron tomography (cryo-ET) has emerged as a groundbreaking technique, allowing researchers to study proteins within their natural cellular environments. Proteins, functioning as vital nano-machines, play crucial roles in life and understanding their localization and interactions is key to both basic research and disease comprehension. However, cryo-ET images pose challenges due to inherent noise and a scarcity of annotated data for training deep learning models.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.cvg.unibe.ch/media/uploads/davtyan/2023/07/17/image.png)
Credit: S. Albert et al./PNAS (CC BY 4.0)
To address these challenges, this project aims to develop a self-supervised pipeline utilizing diffusion models for instance segmentation in cryo-ET images. By leveraging the power of diffusion models, which iteratively diffuse information to capture underlying patterns, the pipeline aims to refine and accurately segment cryo-ET images. Self-supervised learning, which relies on unlabeled data, reduces the dependence on extensive manual annotations. Successful implementation of this pipeline could revolutionize the field of structural biology, facilitating the analysis of protein distribution and organization within cellular contexts. Moreover, it has the potential to alleviate the limitations posed by limited annotated data, enabling more efficient extraction of valuable information from cryo-ET images and advancing biomedical applications by enhancing our understanding of protein behavior.
Methods The segmentation pipeline for cryo-electron tomography (cryo-ET) images consists of two stages: training a diffusion model for image generation and training an instance segmentation U-Net using synthetic and real segmentation masks.
1. Diffusion Model Training: a. Data Collection: Collect and curate cryo-ET image datasets from the EMPIAR database (https://www.ebi.ac.uk/empiar/). b. Architecture Design: Select an appropriate architecture for the diffusion model. c. Model Evaluation: Cryo-ET experts will help assess image quality and fidelity through visual inspection and quantitative measures 2. Building the Segmentation dataset: a. Synthetic and real mask generation: Use the trained diffusion model to generate synthetic cryo-ET images. The diffusion process will be seeded from either a real or a synthetic segmentation mask. This will yield to pairs of cryo-ET images and segmentation masks. 3. Instance Segmentation U-Net Training: a. Architecture Design: Choose an appropriate instance segmentation U-Net architecture. b. Model Evaluation: Evaluate the trained U-Net using precision, recall, and F1 score metrics.
By combining the diffusion model for cryo-ET image generation and the instance segmentation U-Net, this pipeline provides an efficient and accurate approach to segment structures in cryo-ET images, facilitating further analysis and interpretation.
References 1. Kwon, Diana. "The secret lives of cells-as never seen before." Nature 598.7882 (2021): 558-560. 2. Moebel, Emmanuel, et al. "Deep learning improves macromolecule identification in 3D cellular cryo-electron tomograms." Nature methods 18.11 (2021): 1386-1394. 3. Rice, Gavin, et al. "TomoTwin: generalized 3D localization of macromolecules in cryo-electron tomograms with structural data mining." Nature Methods (2023): 1-10.
Contacts Prof. Thomas Lemmin Institute of Biochemistry and Molecular Medicine Bühlstrasse 28, 3012 Bern ( [email protected] )
Prof. Paolo Favaro Institute of Computer Science Neubrückstrasse 10 3012 Bern ( [email protected] )
Adding and removing multiple sclerosis lesions with to imaging with diffusion networks
Background multiple sclerosis lesions are the result of demyelination: they appear as dark spots on t1 weighted mri imaging and as bright spots on flair mri imaging. image analysis for ms patients requires both the accurate detection of new and enhancing lesions, and the assessment of atrophy via local thickness and/or volume changes in the cortex. detection of new and growing lesions is possible using deep learning, but made difficult by the relative lack of training data: meanwhile cortical morphometry can be affected by the presence of lesions, meaning that removing lesions prior to morphometry may be more robust. existing ‘lesion filling’ methods are rather crude, yielding unrealistic-appearing brains where the borders of the removed lesions are clearly visible., aim: denoising diffusion networks are the current gold standard in mri image generation [1]: we aim to leverage this technology to remove and add lesions to existing mri images. this will allow us to create realistic synthetic mri images for training and validating ms lesion segmentation algorithms, and for investigating the sensitivity of morphometry software to the presence of ms lesions at a variety of lesion load levels., materials and methods: a large, annotated, heterogeneous dataset of mri data from ms patients, as well as images of healthy controls without white matter lesions, will be available for developing the method. the student will work in a research group with a long track record in applying deep learning methods to neuroimaging data, as well as experience training denoising diffusion networks..
Nature of the Thesis:
Literature review: 10%
Replication of Blob Loss paper: 10%
Implementation of the sliding window metrics:10%
Training on MS lesion segmentation task: 30%
Extension to other datasets: 20%
Results analysis: 20%
Fig. Results of an existing lesion filling algorithm, showing inadequate performance
Requirements:
Interest/Experience with image processing
Python programming knowledge (Pytorch bonus)
Interest in neuroimaging
Supervisor(s):
PD. Dr. Richard McKinley
Institutes: Diagnostic and Interventional Neuroradiology
Center for Artificial Intelligence in Medicine (CAIM), University of Bern
References: [1] Brain Imaging Generation with Latent Diffusion Models , Pinaya et al, Accepted in the Deep Generative Models workshop @ MICCAI 2022 , https://arxiv.org/abs/2209.07162
Contact : PD Dr Richard McKinley, Support Centre for Advanced Neuroimaging ( [email protected] )
Improving metrics and loss functions for targets with imbalanced size: sliding window Dice coefficient and loss.
Background The Dice coefficient is the most commonly used metric for segmentation quality in medical imaging, and a differentiable version of the coefficient is often used as a loss function, in particular for small target classes such as multiple sclerosis lesions. Dice coefficient has the benefit that it is applicable in instances where the target class is in the minority (for example, in case of segmenting small lesions). However, if lesion sizes are mixed, the loss and metric is biased towards performance on large lesions, leading smaller lesions to be missed and harming overall lesion detection. A recently proposed loss function (blob loss[1]) aims to combat this by treating each connected component of a lesion mask separately, and claims improvements over Dice loss on lesion detection scores in a variety of tasks.
Aim: The aim of this thesisis twofold. First, to benchmark blob loss against a simple, potentially superior loss for instance detection: sliding window Dice loss, in which the Dice loss is calculated over a sliding window across the area/volume of the medical image. Second, we will investigate whether a sliding window Dice coefficient is better corellated with lesion-wise detection metrics than Dice coefficient and may serve as an alternative metric capturing both global and instance-wise detection.
Materials and Methods: A large, annotated, heterogeneous dataset of MRI data from MS patients will be available for benchmarking the method, as well as our existing codebases for MS lesion segmentation. Extension of the method to other diseases and datasets (such as covered in the blob loss paper) will make the method more plausible for publication. The student will work alongside clinicians and engineers carrying out research in multiple sclerosis lesion segmentation, in particular in the context of our running project supported by the CAIM grant.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.cvg.unibe.ch/media/uploads/davtyan/2023/03/22/image_1ouV90R.png)
Fig. An annotated MS lesion case, showing the variety of lesion sizes
References: [1] blob loss: instance imbalance aware loss functions for semantic segmentation, Kofler et al, https://arxiv.org/abs/2205.08209
Idempotent and partial skull-stripping in multispectral MRI imaging
Background Skull stripping (or brain extraction) refers to the masking of non-brain tissue from structural MRI imaging. Since 3D MRI sequences allow reconstruction of facial features, many data providers supply data only after skull-stripping, making this a vital tool in data sharing. Furthermore, skull-stripping is an important pre-processing step in many neuroimaging pipelines, even in the deep-learning era: while many methods could now operate on data with skull present, they have been trained only on skull-stripped data and therefore produce spurious results on data with the skull present.
High-quality skull-stripping algorithms based on deep learning are now widely available: the most prominent example is HD-BET [1]. A major downside of HD-BET is its behaviour on datasets to which skull-stripping has already been applied: in this case the algorithm falsely identifies brain tissue as skull and masks it. A skull-stripping algorithm F not exhibiting this behaviour would be idempotent: F(F(x)) = F(x) for any image x. Furthermore, legacy datasets from before the availability of high-quality skull-stripping algorithms may still contain images which have been inadequately skull-stripped: currently the only solution to improve the skull-stripping on this data is to go back to the original datasource or to manually correct the skull-stripping, which is time-consuming and prone to error.
Aim: In this project, the student will develop an idempotent skull-stripping network which can also handle partially skull-stripped inputs. In the best case, the network will operate well on a large subset of the data we work with (e.g. structural MRI, diffusion-weighted MRI, Perfusion-weighted MRI, susceptibility-weighted MRI, at a variety of field strengths) to maximize the future applicability of the network across the teams in our group.
Materials and Methods: Multiple datasets, both publicly available and internal (encompassing thousands of 3D volumes) will be available. Silver standard reference data for standard sequences at 1.5T and 3T can be generated using existing tools such as HD-BET: for other sequences and field strengths semi-supervised learning or methods improving robustness to domain shift may be employed. Robustness to partial skull-stripping may be induced by a combination of learning theory and model-based approaches.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.cvg.unibe.ch/media/uploads/davtyan/2023/03/22/image_x6DmXfk.png)
Dataset curation: 10%
Idempotent skull-stripping model building: 30%
Modelling of partial skull-stripping:10%
Extension of model to handle partial skull: 30%
Results analysis: 10%
Fig. An example of failed skull-stripping requiring manual correction
References: [1] Isensee, F, Schell, M, Pflueger, I, et al. Automated brain extraction of multisequence MRI using artificial neural networks. Hum Brain Mapp . 2019; 40: 4952– 4964. https://doi.org/10.1002/hbm.24750
Automated leaf detection and leaf area estimation (for Arabidopsis thaliana)
Correlating plant phenotypes such as leaf area or number of leaves to the genotype (i.e. changes in DNA) is a common goal for plant breeders and molecular biologists. Such data can not only help to understand fundamental processes in nature, but also can help to improve ecotypes, e.g., to perform better under climate change, or reduce fertiliser input. However, collecting data for many plants is very time consuming and automated data acquisition is necessary.
The project aims at building a machine learning model to automatically detect plants in top-view images (see examples below), segment their leaves (see Fig C) and to estimate the leaf area. This information will then be used to determine the leaf area of different Arabidopsis ecotypes. The project will be carried out in collaboration with researchers of the Institute of Plant Sciences at the University of Bern. It will also involve the design and creation of a dataset of plant top-views with the corresponding annotation (provided by experts at the Institute of Plant Sciences).
![satellite image processing thesis titles](https://www.cvg.unibe.ch/media/uploads/davtyan/2022/09/22/1.png)
Contact: Prof. Dr. Paolo Favaro ( [email protected] )
Master Projects at the ARTORG Center
The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Assessment of Digital Biomarkers at Home by Radar. [PDF] Comparison of Radar, Seismograph and Ballistocardiography and to Monitor Sleep at Home. [PDF] Sentimental Analysis in Speech. [PDF] Contact: Dr. Stephan Gerber ( [email protected] )
Internship in Computational Imaging at Prophesee
A 6 month intership at Prophesee, Grenoble is offered to a talented Master Student.
The topic of the internship is working on burst imaging following the work of Sam Hasinoff , and exploring ways to improve it using event-based vision.
A compensation to cover the expenses of living in Grenoble is offered. Only students that have legal rights to work in France can apply.
Anyone interested can send an email with the CV to Daniele Perrone ( [email protected] ).
Using machine learning applied to wearables to predict mental health
This Master’s project lies at the intersection of psychiatry and computer science and aims to use machine learning techniques to improve health. Using sensors to detect sleep and waking behavior has as of yet unexplored potential to reveal insights into health. In this study, we make use of a watch-like device, called an actigraph, which tracks motion to quantify sleep behavior and waking activity. Participants in the study consist of healthy and depressed adolescents and wear actigraphs for a year during which time we query their mental health status monthly using online questionnaires. For this masters thesis we aim to make use of machine learning methods to predict mental health based on the data from the actigraph. The ability to predict mental health crises based on sleep and wake behavior would provide an opportunity for intervention, significantly impacting the lives of patients and their families. This Masters thesis is a collaboration between Professor Paolo Favaro at the Institute of Computer Science ( [email protected] ) and Dr Leila Tarokh at the Universitäre Psychiatrische Dienste (UPD) ( [email protected] ). We are looking for a highly motivated individual interested in bridging disciplines.
Bachelor or Master Projects at the ARTORG Center
The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple BSc- and MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Machine Learning Based Gait-Parameter Extraction by Using Simple Rangefinder Technology. [PDF] Detection of Motion in Video Recordings [PDF] Home-Monitoring of Elderly by Radar [PDF] Gait feature detection in Parkinson's Disease [PDF] Development of an arthroscopic training device using virtual reality [PDF] Contact: Dr. Stephan Gerber ( [email protected] ), Michael Single ( [email protected]. ch )
Dynamic Transformer
Level: bachelor.
Visual Transformers have obtained state of the art classification accuracies [ViT, DeiT, T2T, BoTNet]. Mixture of experts could be used to increase the capacity of a neural network by learning instance dependent execution pathways in a network [MoE]. In this research project we aim to push the transformers to their limit and combine their dynamic attention with MoEs, compared to Switch Transformer [Switch], we will use a much more efficient formulation of mixing [CondConv, DynamicConv] and we will use this idea in the attention part of the transformer, not the fully connected layer.
- Input dependent attention kernel generation for better transformer layers.
Publication Opportunity: Dynamic Neural Networks Meets Computer Vision (a CVPR 2021 Workshop)
Extensions:
- The same idea could be extended to other ViT/Transformer based models [DETR, SETR, LSTR, TrackFormer, BERT]
Related Papers:
- Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT]
- DeiT: Data-efficient Image Transformers [DeiT]
- Bottleneck Transformers for Visual Recognition [BoTNet]
- Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT]
- Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [MoE]
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [Switch]
- CondConv: Conditionally Parameterized Convolutions for Efficient Inference [CondConv]
- Dynamic Convolution: Attention over Convolution Kernels [DynamicConv]
- End-to-End Object Detection with Transformers [DETR]
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR]
- End-to-end Lane Shape Prediction with Transformers [LSTR]
- TrackFormer: Multi-Object Tracking with Transformers [TrackFormer]
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT]
Contact: Sepehr Sameni
Visual Transformers have obtained state of the art classification accuracies for 2d images[ViT, DeiT, T2T, BoTNet]. In this project, we aim to extend the same ideas to 3d data (videos), which requires a more efficient attention mechanism [Performer, Axial, Linformer]. In order to accelerate the training process, we could use [Multigrid] technique.
- Better video understanding by attention blocks.
Publication Opportunity: LOVEU (a CVPR workshop) , Holistic Video Understanding (a CVPR workshop) , ActivityNet (a CVPR workshop)
- Rethinking Attention with Performers [Performer]
- Axial Attention in Multidimensional Transformers [Axial]
- Linformer: Self-Attention with Linear Complexity [Linformer]
- A Multigrid Method for Efficiently Training Video Models [Multigrid]
GIRAFFE is a newly introduced GAN that can generate scenes via composition with minimal supervision [GIRAFFE]. Generative methods can implicitly learn interpretable representation as can be seen in GAN image interpretations [GANSpace, GanLatentDiscovery]. Decoding GIRAFFE could give us per-object interpretable representations that could be used for scene manipulation, data augmentation, scene understanding, semantic segmentation, pose estimation [iNeRF], and more.
In order to invert a GIRAFFE model, we will first train the generative model on Clevr and CompCars datasets, then we add a decoder to the pipeline and train this autoencoder. We can make the task easier by knowing the number of objects in the scene and/or knowing their positions.
Goals:
Scene Manipulation and Decomposition by Inverting the GIRAFFE
Publication Opportunity: DynaVis 2021 (a CVPR workshop on Dynamic Scene Reconstruction)
Related Papers:
- GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [GIRAFFE]
- Neural Scene Graphs for Dynamic Scenes
- pixelNeRF: Neural Radiance Fields from One or Few Images [pixelNeRF]
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [NeRF]
- Neural Volume Rendering: NeRF And Beyond
- GANSpace: Discovering Interpretable GAN Controls [GANSpace]
- Unsupervised Discovery of Interpretable Directions in the GAN Latent Space [GanLatentDiscovery]
- Inverting Neural Radiance Fields for Pose Estimation [iNeRF]
Quantized ViT
Visual Transformers have obtained state of the art classification accuracies [ViT, CLIP, DeiT], but the best ViT models are extremely compute heavy and running them even only for inference (not doing backpropagation) is expensive. Running transformers cheaply by quantization is not a new problem and it has been tackled before for BERT [BERT] in NLP [Q-BERT, Q8BERT, TernaryBERT, BinaryBERT]. In this project we will be trying to quantize pretrained ViT models.
Quantizing ViT models for faster inference and smaller models without losing accuracy
Publication Opportunity: Binary Networks for Computer Vision 2021 (a CVPR workshop)
Extensions:
- Having a fast pipeline for image inference with ViT will allow us to dig deep into the attention of ViT and analyze it, we might be able to prune some attention heads or replace them with static patterns (like local convolution or dilated patterns), We might be even able to replace the transformer with performer and increase the throughput even more [Performer].
- The same idea could be extended to other ViT based models [DETR, SETR, LSTR, TrackFormer, CPTR, BoTNet, T2TViT]
- Learning Transferable Visual Models From Natural Language Supervision [CLIP]
- Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT]
- DeiT: Data-efficient Image Transformers [DeiT]
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT]
- Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT [Q-BERT]
- Q8BERT: Quantized 8Bit BERT [Q8BERT]
- TernaryBERT: Distillation-aware Ultra-low Bit BERT [TernaryBERT]
- BinaryBERT: Pushing the Limit of BERT Quantization [BinaryBERT]
- Rethinking Attention with Performers [Performer]
- End-to-End Object Detection with Transformers [DETR]
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR]
- End-to-end Lane Shape Prediction with Transformers [LSTR]
- TrackFormer: Multi-Object Tracking with Transformers [TrackFormer]
- CPTR: Full Transformer Network for Image Captioning [CPTR]
- Bottleneck Transformers for Visual Recognition [BoTNet]
- Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT]
Multimodal Contrastive Learning
Recently contrastive learning has gained a lot of attention for self-supervised image representation learning [SimCLR, MoCo]. Contrastive learning could be extended to multimodal data, like videos (images and audio) [CMC, CoCLR]. Most contrastive methods require large batch sizes (or large memory pools) which makes them expensive for training. In this project we are going to use non batch size dependent contrastive methods [SwAV, BYOL, SimSiam] to train multimodal representation extractors.
Our main goal is to compare the proposed method with the CMC baseline, so we will be working with STL10, ImageNet, UCF101, HMDB51, and NYU Depth-V2 datasets.
Inspired by the recent works on smaller datasets [ConVIRT, CPD], to accelerate the training speed, we could start with two pretrained single-modal models and finetune them with the proposed method.
- Extending SwAV to multimodal datasets
- Grasping a better understanding of the BYOL
Publication Opportunity: MULA 2021 (a CVPR workshop on Multimodal Learning and Applications)
- Most knowledge distillation methods for contrastive learners also use large batch sizes (or memory pools) [CRD, SEED], the proposed method could be extended for knowledge distillation.
- One could easily extend this idea to multiview learning, for example one could have two different networks working on the same input and train them with contrastive learning, this may lead to better models [DeiT] by cross-model inductive biases communications.
- Self-supervised Co-training for Video Representation Learning [CoCLR]
- Learning Spatiotemporal Features via Video and Text Pair Discrimination [CPD]
- Audio-Visual Instance Discrimination with Cross-Modal Agreement [AVID-CMA]
- Self-Supervised Learning by Cross-Modal Audio-Video Clustering [XDC]
- Contrastive Multiview Coding [CPC]
- Contrastive Learning of Medical Visual Representations from Paired Images and Text [ConVIRT]
- A Simple Framework for Contrastive Learning of Visual Representations [SimCLR]
- Momentum Contrast for Unsupervised Visual Representation Learning [MoCo]
- Bootstrap your own latent: A new approach to self-supervised Learning [BYOL]
- Exploring Simple Siamese Representation Learning [SimSiam]
- Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [SwAV]
- Contrastive Representation Distillation [CRD]
- SEED: Self-supervised Distillation For Visual Representation [SEED]
Robustness of Neural Networks
Neural Networks have been found to achieve surprising performance in several tasks such as classification, detection and segmentation. However, they are also very sensitive to small (controlled) changes to the input. It has been shown that some changes to an image that are not visible to the naked eye may lead the network to output an incorrect label. This thesis will focus on studying recent progress in this area and aim to build a procedure for a trained network to self-assess its reliability in classification or one of the popular computer vision tasks.
Contact: Paolo Favaro
Masters projects at sitem center
The Personalised Medicine Research Group at the sitem Center for Translational Medicine and Biomedical Entrepreneurship is offering multiple MSc thesis projects to the biomed eng MSc students that may also be of interest to the computer science students. Automated quantification of cartilage quality for hip treatment decision support. PDF Automated quantification of massive rotator cuff tears from MRI. PDF Deep learning-based segmentation and fat fraction analysis of the shoulder muscles using quantitative MRI. PDF Unsupervised Domain Adaption for Cross-Modality Hip Joint Segmentation. PDF Contact: Dr. Kate Gerber
Internships/Master thesis @ Chronocam
3-6 months internships on event-based computer vision. Chronocam is a rapidly growing startup developing event-based technology, with more than 15 PhDs working on problems like tracking, detection, classification, SLAM, etc. Event-based computer vision has the potential to solve many long-standing problems in traditional computer vision, and this is a super exciting time as this potential is becoming more and more tangible in many real-world applications. For next year we are looking for motivated Master and PhD students with good software engineering skills (C++ and/or python), and preferable good computer vision and deep learning background. PhD internships will be more research focused and possibly lead to a publication. For each intern we offer a compensation to cover the expenses of living in Paris. List of some of the topics we want to explore:
- Photo-realistic image synthesis and super-resolution from event-based data (PhD)
- Self-supervised representation learning (PhD)
- End-to-end Feature Learning for Event-based Data
- Bio-inspired Filtering using Spiking Networks
- On-the fly Compression of Event-based Streams for Low-Power IoT Cameras
- Tracking of Multiple Objects with a Dual-Frequency Tracker
- Event-based Autofocus
- Stabilizing an Event-based Stream using an IMU
- Crowd Monitoring for Low-power IoT Cameras
- Road Extraction from an Event-based Camera Mounted in a Car for Autonomous Driving
- Sign detection from an Event-based Camera Mounted in a Car for Autonomous Driving
- High-frequency Eye Tracking
Email with attached CV to Daniele Perrone at [email protected] .
Contact: Daniele Perrone
Object Detection in 3D Point Clouds
Today we have many 3D scanning techniques that allow us to capture the shape and appearance of objects. It is easier than ever to scan real 3D objects and transform them into a digital model for further processing, such as modeling, rendering or animation. However, the output of a 3D scanner is often a raw point cloud with little to no annotations. The unstructured nature of the point cloud representation makes it difficult for processing, e.g. surface reconstruction. One application is the detection and segmentation of an object of interest. In this project, the student is challenged to design a system that takes a point cloud (a 3D scan) as input and outputs the names of objects contained in the scan. This output can then be used to eliminate outliers or points that belong to the background. The approach involves collecting a large dataset of 3D scans and training a neural network on it.
Contact: Adrian Wälchli
Shape Reconstruction from a Single RGB Image or Depth Map
A photograph accurately captures the world in a moment of time and from a specific perspective. Since it is a projection of the 3D space to a 2D image plane, the depth information is lost. Is it possible to restore it, given only a single photograph? In general, the answer is no. This problem is ill-posed, meaning that many different plausible depth maps exist, and there is no way of telling which one is the correct one. However, if we cover one of our eyes, we are still able to recognize objects and estimate how far away they are. This motivates the exploration of an approach where prior knowledge can be leveraged to reduce the ill-posedness of the problem. Such a prior could be learned by a deep neural network, trained with many images and depth maps.
CNN Based Deblurring on Mobile
Deblurring finds many applications in our everyday life. It is particularly useful when taking pictures on handheld devices (e.g. smartphones) where camera shake can degrade important details. Therefore, it is desired to have a good deblurring algorithm implemented directly in the device. In this project, the student will implement and optimize a state-of-the-art deblurring method based on a deep neural network for deployment on mobile phones (Android). The goal is to reduce the number of network weights in order to reduce the memory footprint while preserving the quality of the deblurred images. The result will be a camera app that automatically deblurs the pictures, giving the user a choice of keeping the original or the deblurred image.
Depth from Blur
If an object in front of the camera or the camera itself moves while the aperture is open, the region of motion becomes blurred because the incoming light is accumulated in different positions across the sensor. If there is camera motion, there is also parallax. Thus, a motion blurred image contains depth information. In this project, the student will tackle the problem of recovering a depth-map from a motion-blurred image. This includes the collection of a large dataset of blurred- and sharp images or videos using a pair or triplet of GoPro action cameras. Two cameras will be used in stereo to estimate the depth map, and the third captures the blurred frames. This data is then used to train a convolutional neural network that will predict the depth map from the blurry image.
Unsupervised Clustering Based on Pretext Tasks
The idea of this project is that we have two types of neural networks that work together: There is one network A that assigns images to k clusters and k (simple) networks of type B perform a self-supervised task on those clusters. The goal of all the networks is to make the k networks of type B perform well on the task. The assumption is that clustering in semantically similar groups will help the networks of type B to perform well. This could be done on the MNIST dataset with B being linear classifiers and the task being rotation prediction.
Adversarial Data-Augmentation
The student designs a data augmentation network that transforms training images in such a way that image realism is preserved (e.g. with a constrained spatial transformer network) and the transformed images are more difficult to classify (trained via adversarial loss against an image classifier). The model will be evaluated for different data settings (especially in the low data regime), for example on the MNIST and CIFAR datasets.
Unsupervised Learning of Lip-reading from Videos
People with sensory impairment (hearing, speech, vision) depend heavily on assistive technologies to communicate and navigate in everyday life. The mass production of media content today makes it impossible to manually translate everything into a common language for assistive technologies, e.g. captions or sign language. In this project, the student employs a neural network to learn a representation for lip-movement in videos in an unsupervised fashion, possibly with an encoder-decoder structure where the decoder reconstructs the audio signal. This requires collecting a large dataset of videos (e.g. from YouTube) of speakers or conversations where lip movement is visible. The outcome will be a neural network that learns an audio-visual representation of lip movement in videos, which can then be leveraged to generate captions for hearing impaired persons.
Learning to Generate Topographic Maps from Satellite Images
Satellite images have many applications, e.g. in meteorology, geography, education, cartography and warfare. They are an accurate and detailed depiction of the surface of the earth from above. Although it is relatively simple to collect many satellite images in an automated way, challenges arise when processing them for use in navigation and cartography. The idea of this project is to automatically convert an arbitrary satellite image, of e.g. a city, to a map of simple 2D shapes (streets, houses, forests) and label them with colors (semantic segmentation). The student will collect a dataset of satellite image and topological maps and train a deep neural network that learns to map from one domain to the other. The data could be obtained from a Google Maps database or similar.
Optimization of OmniMotion, a tracking algorithm
Martí farré farrús · june 2024.
This thesis presents Quasi-OmniFastTrack, an improved version of the OmniMotion algorithm for long-term pixel tracking in videos. The key contribution is reducing the computational expense and training time of OmniMotion while maintaining comparable tracking performance. The main bottleneck in OmniMotion was identified to be the NeRF network used for 3D scene representation. Quasi-OmniFastTrack replaces this with a pre-trained depth estimation model, significantly reducing training time, based on the work introduced in OmniFastTrack, hence the name. The invertible neural network for mapping between local and canonical coordinates is retained, but optimized depths are used to lift 2D pixels to 3D. Experiments show that Quasi-OmniFastTrack reduces training time by over 50% compared to OmniMotion while achieving similar qualitative tracking results on sequences with occlusions. Performance degrades somewhat on fast-moving scenes. The ablation studies demonstrate the importance of optimizing the initial depth estimates during training. While not matching OmniMotion's robustness in all scenarios, Quasi-OmniFastTrack offers a compelling speed-accuracy tradeoff, enabling long-term tracking on more videos in practical timeframes. Future work on incorporating other modifications introduced in OmniFastTrack, like long-term semantic features, could further improve tracking consistency.
New Variables of Brain Morphometry: the Potential and Limitations of CNN Regression
Timo blattner · sept. 2022.
The calculation of variables of brain morphology is computationally very expensive and time-consuming. A previous work showed the feasibility of ex- tracting the variables directly from T1-weighted brain MRI images using a con- volutional neural network. We used significantly more data and extended their model to a new set of neuromorphological variables, which could become inter- esting biomarkers in the future for the diagnosis of brain diseases. The model shows for nearly all subjects a less than 5% mean relative absolute error. This high relative accuracy can be attributed to the low morphological variance be- tween subjects and the ability of the model to predict the cortical atrophy age trend. The model however fails to capture all the variance in the data and shows large regional differences. We attribute these limitations in part to the moderate to poor reliability of the ground truth generated by FreeSurfer. We further investigated the effects of training data size and model complexity on this regression task and found that the size of the dataset had a significant impact on performance, while deeper models did not perform better. Lack of interpretability and dependence on a silver ground truth are the main drawbacks of this direct regression approach.
Home Monitoring by Radar
Lars ziegler · sept. 2022.
Detection and tracking of humans via UWB radars is a promising and continuously evolving field with great potential for medical technology. This contactless method of acquiring data of a patients movement patterns is ideal for in home application. As irregularities in a patients movement patterns are an indicator for various health problems including neurodegenerative diseases, the insight this data could provide may enable earlier detection of such problems. In this thesis a signal processing pipeline is presented with which a persons movement is modeled. During an experiment 142 measurements were recorded by two separate radar systems and one lidar system which each consisted of multiple sensors. The models that were calculated on these measurements by the signal processing pipeline were used to predict the times when a person stood up or sat down. The predictions showed an accuracy of 72.2%.
Revisiting non-learning based 3D reconstruction from multiple images
Aaron sägesser · oct. 2021.
Arthroscopy consists of challenging tasks and requires skills that even today, young surgeons still train directly throughout the surgery. Existing simulators are expensive and rarely available. Through the growing potential of virtual reality(VR) (head-mounted) devices for simulation and their applicability in the medical context, these devices have become a promising alternative that would be orders of magnitude cheaper and could be made widely available. To build a VR-based training device for arthroscopy is the overall aim of our project, as this would be of great benefit and might even be applicable in other minimally invasive surgery (MIS). This thesis marks a first step of the project with its focus to explore and compare well-known algorithms in a multi-view stereo (MVS) based 3D reconstruction with respect to imagery acquired by an arthroscopic camera. Simultaneously with this reconstruction, we aim to gain essential measures to compare the VR environment to the real world, as validation of the realism of future VR tasks. We evaluate 3 different feature extraction algorithms with 3 different matching techniques and 2 different algorithms for the estimation of the fundamental (F) matrix. The evaluation of these 18 different setups is made with a reconstruction pipeline embedded in a jupyter notebook implemented in python based on common computer vision libraries and compared with imagery generated with a mobile phone as well as with the reconstruction results of state-of-the-art (SOTA) structure-from-motion (SfM) software COLMAP and Multi-View Environment (MVE). Our comparative analysis manifests the challenges of heavy distortion, the fish-eye shape and weak image quality of arthroscopic imagery, as all results are substantially worse using this data. However, there are huge differences regarding the different setups. Scale Invariant Feature Transform (SIFT) and Oriented FAST Rotated BRIEF (ORB) in combination with k-Nearest Neighbour (kNN) matching and Least Median of Squares (LMedS) present the most promising results. Overall, the 3D reconstruction pipeline is a useful tool to foster the process of gaining measurements from the arthroscopic exploration device and to complement the comparative research in this context.
Examination of Unsupervised Representation Learning by Predicting Image Rotations
Eric lagger · sept. 2020.
In recent years deep convolutional neural networks achieved a lot of progress. To train such a network a lot of data is required and in supervised learning algorithms it is necessary that the data is labeled. To label data there is a lot of human work needed and this takes a lot of time and money to be done. To avoid the inconveniences that come with this we would like to find systems that don’t need labeled data and therefore are unsupervised learning algorithms. This is the importance of unsupervised algorithms, even though their outcome is not yet on the same qualitative level as supervised algorithms. In this thesis we will discuss an approach of such a system and compare the results to other papers. A deep convolutional neural network is trained to learn the rotations that have been applied to a picture. So we take a large amount of images and apply some simple rotations and the task of the network is to discover in which direction the image has been rotated. The data doesn’t need to be labeled to any category or anything else. As long as all the pictures are upside down we hope to find some high dimensional patterns for the network to learn.
StitchNet: Image Stitching using Autoencoders and Deep Convolutional Neural Networks
Maurice rupp · sept. 2019.
This thesis explores the prospect of artificial neural networks for image processing tasks. More specifically, it aims to achieve the goal of stitching multiple overlapping images to form a bigger, panoramic picture. Until now, this task is solely approached with ”classical”, hardcoded algorithms while deep learning is at most used for specific subtasks. This thesis introduces a novel end-to-end neural network approach to image stitching called StitchNet, which uses a pre-trained autoencoder and deep convolutional networks. Additionally to presenting several new datasets for the task of supervised image stitching with each 120’000 training and 5’000 validation samples, this thesis also conducts various experiments with different kinds of existing networks designed for image superresolution and image segmentation adapted to the task of image stitching. StitchNet outperforms most of the adapted networks in both quantitative as well as qualitative results.
Facial Expression Recognition in the Wild
Luca rolshoven · sept. 2019.
The idea of inferring the emotional state of a subject by looking at their face is nothing new. Neither is the idea of automating this process using computers. Researchers used to computationally extract handcrafted features from face images that had proven themselves to be effective and then used machine learning techniques to classify the facial expressions using these features. Recently, there has been a trend towards using deeplearning and especially Convolutional Neural Networks (CNNs) for the classification of these facial expressions. Researchers were able to achieve good results on images that were taken in laboratories under the same or at least similar conditions. However, these models do not perform very well on more arbitrary face images with different head poses and illumination. This thesis aims to show the challenges of Facial Expression Recognition (FER) in this wild setting. It presents the currently used datasets and the present state-of-the-art results on one of the biggest facial expression datasets currently available. The contributions of this thesis are twofold. Firstly, I analyze three famous neural network architectures and their effectiveness on the classification of facial expressions. Secondly, I present two modifications of one of these networks that lead to the proposed STN-COV model. While this model does not outperform all of the current state-of-the-art models, it does beat several ones of them.
A Study of 3D Reconstruction of Varying Objects with Deformable Parts Models
Raoul grossenbacher · july 2019.
This work covers a new approach to 3D reconstruction. In traditional 3D reconstruction one uses multiple images of the same object to calculate a 3D model by taking information gained from the differences between the images, like camera position, illumination of the images, rotation of the object and so on, to compute a point cloud representing the object. The characteristic trait shared by all these approaches is that one can almost change everything about the image, but it is not possible to change the object itself, because one needs to find correspondences between the images. To be able to use different instances of the same object, we used a 3D DPM model that can find different parts of an object in an image, thereby detecting the correspondences between the different pictures, which we then can use to calculate the 3D model. To take this theory to practise, we gave a 3D DPM model, which was trained to detect cars, pictures of different car brands, where no pair of images showed the same vehicle and used the detected correspondences and the Factorization Method to compute the 3D point cloud. This technique leads to a completely new approach in 3D reconstruction, because changing the object itself was never done before.
Motion deblurring in the wild replication and improvements
Alvaro juan lahiguera · jan. 2019, coma outcome prediction with convolutional neural networks, stefan jonas · oct. 2018, automatic correction of self-introduced errors in source code, sven kellenberger · aug. 2018, neural face transfer: training a deep neural network to face-swap, till nikolaus schnabel · july 2018.
This thesis explores the field of artificial neural networks with realistic looking visual outputs. It aims at morphing face pictures of a specific identity to look like another individual by only modifying key features, such as eye color, while leaving identity-independent features unchanged. Prior works have covered the topic of symmetric translation between two specific domains but failed to optimize it on faces where only parts of the image may be changed. This work applies a face masking operation to the output at training time, which forces the image generator to preserve colors while altering the face, fitting it naturally inside the unmorphed surroundings. Various experiments are conducted including an ablation study on the final setting, decreasing the baseline identity switching performance from 81.7% to 75.8 % whilst improving the average χ2 color distance from 0.551 to 0.434. The provided code-based software gives users easy access to apply this neural face swap to images and videos of arbitrary crop and brings Computer Vision one step closer to replacing Computer Graphics in this specific area.
A Study of the Importance of Parts in the Deformable Parts Model
Sammer puran · june 2017, self-similarity as a meta feature, lucas husi · april 2017, a study of 3d deformable parts models for detection and pose-estimation, simon jenni · march 2015, accelerated federated learning on client silos with label noise: rho selection in classification and segmentation, irakli kelbakiani · may 2024.
Federated Learning has recently gained more research interest. This increased attention is caused by factors including the growth of decentralized data, privacy concerns, and new privacy regulations. In Federated Learning, remote servers keep training a model on local datasets independently, and subsequently, local models are aggregated into a global model, which achieves better overall performance. Sending local model weights instead of the entire dataset is a significant advantage of Federated Learning over centralized classical machine learning algorithms. Federated learning involves uploading and downloading model parameters multiple times, so there are multiple communication rounds between the global server and remote client servers, which imposes challenges. The high number of necessary communication rounds not only increases high-cost communication overheads but is also a critical limitation for servers with low network bandwidth, which leads to latency and a higher probability of training failures caused by communication breakdowns. To mitigate these challenges, we aim to provide a fast-convergent Federated Learning training methodology that decreases the number of necessary communication rounds. We found a paper about Reducible Holdout Loss Selection (RHO-Loss) batch selection methodology, which ”selects low-noise, task-relevant, non-redundant points for training” [1]; we hypothesize, if client silos employ RHO-Loss methodology and successfully avoid training their local models on noisy and non-relevant samples, clients may offer stable and consistent updates to the global server, which could lead to faster convergence of the global model. Our contribution focuses on investigating the RHO-Loss method in a simulated federated setting for the Clothing1M dataset. We also examine its applicability to medical datasets and check its effectiveness in a simulated federated environment. Our experimental results show a promising outcome, specifically a reduction in communication rounds for the Clothing1M dataset. However, as the success of the RHO-Loss selection method depends on the availability of sufficient training data for the target RHO model and for the Irreducible RHO model, we emphasize that our contribution applies to those Federated Learning scenarios where client silos hold enough training data to successfully train and benefit from their RHO model on their local dataset.
Amodal Leaf Segmentation
Nicolas maier · nov. 2023.
Plant phenotyping is the process of measuring and analyzing various traits of plants. It provides essential information on how genetic and environmental factors affect plant growth and development. Manual phenotyping is highly time-consuming; therefore, many computer vision and machine learning based methods have been proposed in the past years to perform this task automatically based on images of the plants. However, the publicly available datasets (in particular, of Arabidopsis thaliana) are limited in size and diversity, making them unsuitable to generalize to new unseen environments. In this work, we propose a complete pipeline able to automatically extract traits of interest from an image of Arabidopsis thaliana. Our method uses a minimal amount of existing annotated data from a source domain to generate a large synthetic dataset adapted to a different target domain (e.g., different backgrounds, lighting conditions, and plant layouts). In addition, unlike the source dataset, the synthetic one provides ground-truth annotations for the occluded parts of the leaves, which are relevant when measuring some characteristics of the plant, e.g., its total area. This synthetic dataset is then used to train a model to perform amodal instance segmentation of the leaves to obtain the total area, leaf count, and color of each plant. To validate our approach, we create a small dataset composed of manually annotated real images of Arabidopsis thaliana, which is used to assess the performance of the models.
Assessment of movement and pose in a hospital bed by ambient and wearable sensor technology in healthy subjects
Tony licata · sept. 2022.
The use of automated systems describing the human motion has become possible in various domains. Most of the proposed systems are designed to work with people moving around in a standing position. Because such system could be interesting in a medical environment, we propose in this work a pipeline that can effectively predict human motion from people lying on beds. The proposed pipeline is tested with a data set composed of 41 participants executing 7 predefined tasks in a bed. The motion of the participants is measured with video cameras, accelerometers and pressure mat. Various experiments are carried with the information retrieved from the data set. Two approaches combining the data from the different measure technologies are explored. The performance of the different carried experiments is measured, and the proposed pipeline is composed with components providing the best results. Later on, we show that the proposed pipeline only needs to use the video cameras, which make the proposed environment easier to implement in real life situations.
Machine Learning Based Prediction of Mental Health Using Wearable-measured Time Series
Seyedeh sharareh mirzargar · sept. 2022.
Depression is the second major cause for years spent in disability and has a growing prevalence in adolescents. The recent Covid-19 pandemic has intensified the situation and limited in-person patient monitoring due to distancing measures. Recent advances in wearable devices have made it possible to record the rest/activity cycle remotely with high precision and in real-world contexts. We aim to use machine learning methods to predict an individual's mental health based on wearable-measured sleep and physical activity. Predicting an impending mental health crisis of an adolescent allows for prompt intervention, detection of depression onset or its recursion, and remote monitoring. To achieve this goal, we train three primary forecasting models; linear regression, random forest, and light gradient boosted machine (LightGBM); and two deep learning models; block recurrent neural network (block RNN) and temporal convolutional network (TCN); on Actigraph measurements to forecast mental health in terms of depression, anxiety, sleepiness, stress, sleep quality, and behavioral problems. Our models achieve a high forecasting performance, the random forest being the winner to reach an accuracy of 98% for forecasting the trait anxiety. We perform extensive experiments to evaluate the models' performance in accuracy, generalization, and feature utilization, using a naive forecaster as the baseline. Our analysis shows minimal mental health changes over two months, making the prediction task easily achievable. Due to these minimal changes in mental health, the models tend to primarily use the historical values of mental health evaluation instead of Actigraph features. At the time of this master thesis, the data acquisition step is still in progress. In future work, we plan to train the models on the complete dataset using a longer forecasting horizon to increase the level of mental health changes and perform transfer learning to compensate for the small dataset size. This interdisciplinary project demonstrates the opportunities and challenges in machine learning based prediction of mental health, paving the way toward using the same techniques to forecast other mental disorders such as internalizing disorder, Parkinson's disease, Alzheimer's disease, etc. and improving the quality of life for individuals who have some mental disorder.
CNN Spike Detector: Detection of Spikes in Intracranial EEG using Convolutional Neural Networks
Stefan jonas · oct. 2021.
The detection of interictal epileptiform discharges in the visual analysis of electroencephalography (EEG) is an important but very difficult, tedious, and time-consuming task. There have been decades of research on computer-assisted detection algorithms, most recently focused on using Convolutional Neural Networks (CNNs). In this thesis, we present the CNN Spike Detector, a convolutional neural network to detect spikes in intracranial EEG. Our dataset of 70 intracranial EEG recordings from 26 subjects with epilepsy introduces new challenges in this research field. We report cross-validation results with a mean AUC of 0.926 (+- 0.04), an area under the precision-recall curve (AUPRC) of 0.652 (+- 0.10) and 12.3 (+- 7.47) false positive epochs per minute for a sensitivity of 80%. A visual examination of false positive segments is performed to understand the model behavior leading to a relatively high false detection rate. We notice issues with the evaluation measures and highlight a major limitation of the common approach of detecting spikes using short segments, namely that the network is not capable to consider the greater context of the segment with regards to its origination. For this reason, we present the Context Model, an extension in which the CNN Spike Detector is supplied with additional information about the channel. Results show promising but limited performance improvements. This thesis provides important findings about the spike detection task for intracranial EEG and lays out promising future research directions to develop a network capable of assisting experts in real-world clinical applications.
PolitBERT - Deepfake Detection of American Politicians using Natural Language Processing
Maurice rupp · april 2021.
This thesis explores the application of modern Natural Language Processing techniques to the detection of artificially generated videos of popular American politicians. Instead of focusing on detecting anomalies and artifacts in images and sounds, this thesis focuses on detecting irregularities and inconsistencies in the words themselves, opening up a new possibility to detect fake content. A novel, domain-adapted, pre-trained version of the language model BERT combined with several mechanisms to overcome severe dataset imbalances yielded the best quantitative as well as qualitative results. Additionally to the creation of the biggest publicly available dataset of English-speaking politicians consisting of 1.5 M sentences from over 1000 persons, this thesis conducts various experiments with different kinds of text classification and sequence processing algorithms applied to the political domain. Furthermore, multiple ablations to manage severe data imbalance are presented and evaluated.
A Study on the Inversion of Generative Adversarial Networks
Ramona beck · march 2021.
The desire to use generative adversarial networks (GANs) for real-world tasks such as object segmentation or image manipulation is increasing as synthesis quality improves, which has given rise to an emerging research area called GAN inversion that focuses on exploring methods for embedding real images into the latent space of a GAN. In this work, we investigate different GAN inversion approaches using an existing generative model architecture that takes a completely unsupervised approach to object segmentation and is based on StyleGAN2. In particular, we propose and analyze algorithms for embedding real images into the different latent spaces Z, W, and W+ of StyleGAN following an optimization-based inversion approach, while also investigating a novel approach that allows fine-tuning of the generator during the inversion process. Furthermore, we investigate a hybrid and a learning-based inversion approach, where in the former we train an encoder with embeddings optimized by our best optimization-based inversion approach, and in the latter we define an autoencoder, consisting of an encoder and the generator of our generative model as a decoder, and train it to map an image into the latent space. We demonstrate the effectiveness of our methods as well as their limitations through a quantitative comparison with existing inversion methods and by conducting extensive qualitative and quantitative experiments with synthetic data as well as real images from a complex image dataset. We show that we achieve qualitatively satisfying embeddings in the W and W+ spaces with our optimization-based algorithms, that fine-tuning the generator during the inversion process leads to qualitatively better embeddings in all latent spaces studied, and that the learning-based approach also benefits from a variable generator as well as a pre-training with our hybrid approach. Furthermore, we evaluate our approaches on the object segmentation task and show that both our optimization-based and our hybrid and learning-based methods are able to generate meaningful embeddings that achieve reasonable object segmentations. Overall, our proposed methods illustrate the potential that lies in the GAN inversion and its application to real-world tasks, especially in the relaxed version of the GAN inversion where the weights of the generator are allowed to vary.
Multi-scale Momentum Contrast for Self-supervised Image Classification
Zhao xueqi · dec. 2020.
With the maturity of supervised learning technology, people gradually shift the research focus to the field of self-supervised learning. ”Momentum Contrast” (MoCo) proposes a new self-supervised learning method and raises the correct rate of self-supervised learning to a new level. Inspired by another article ”Representation Learning by Learning to Count”, if a picture is divided into four parts and passed through a neural network, it is possible to further improve the accuracy of MoCo. Different from the original MoCo, this MoCo variant (Multi-scale MoCo) does not directly pass the image through the encoder after the augmented images. Multi-scale MoCo crops and resizes the augmented images, and the obtained four parts are respectively passed through the encoder and then summed (upsampled version do not do resize to input but resize the contrastive samples). This method of images crop is not only used for queue q but also used for comparison queue k, otherwise the weights of queue k might be damaged during the moment update. This will further discussed in the experiments chapter between downsampled Multi-scale version and downsampled both Multi-scale version. Human beings also have the same principle of object recognition: when human beings see something they are familiar with, even if the object is not fully displayed, people can still guess the object itself with a high probability. Because of this, Multi-scale MoCo applies this concept to the pretext part of MoCo, hoping to obtain better feature extraction. In this thesis, there are three versions of Multi-scale MoCo, downsampled input samples version, downsampled input samples and contrast samples version and upsampled input samples version. The differences between these versions will be described in more detail later. The neural network architecture comparison includes ResNet50 , and the tested data set is STL-10. The weights obtained in pretext will be transferred to self-supervised learning, and in the process of self-supervised learning, the weights of other layers except the final linear layer are frozen without changing (these weights come from pretext).
Self-Supervised Learning Using Siamese Networks and Binary Classifier
Dušan mihajlov · march 2020.
In this thesis, we present several approaches for training a convolutional neural network using only unlabeled data. Our autonomously supervised learning algorithms are based on connections between image patch i. e. zoomed image and its original. Using the siamese architecture neural network we aim to recognize, if the image patch, which is input to the first neural network part, comes from the same image presented to the second neural network part. By applying transformations to both images, and different zoom sizes at different positions, we force the network to extract high level features using its convolutional layers. At the top of our siamese architecture, we have a simple binary classifier that measures the difference between feature maps that we extract and makes a decision. Thus, the only way that the classifier will solve the task correctly is when our convolutional layers are extracting useful representations. Those representations we can than use to solve many different tasks that are related to the data used for unsupervised training. As the main benchmark for all of our models, we used STL10 dataset, where we train a linear classifier on the top of our convolutional layers with a small amount of manually labeled images, which is a widely used benchmark for unsupervised learning tasks. We also combine our idea with recent work on the same topic, and the network called RotNet, which makes use of image rotations and therefore forces the network to learn rotation dependent features from the dataset. As a result of this combination we create a new procedure that outperforms original RotNet.
Learning Object Representations by Mixing Scenes
Lukas zbinden · may 2019.
In the digital age of ever increasing data amassment and accessibility, the demand for scalable machine learning models effective at refining the new oil is unprecedented. Unsupervised representation learning methods present a promising approach to exploit this invaluable yet unlabeled digital resource at scale. However, a majority of these approaches focuses on synthetic or simplified datasets of images. What if a method could learn directly from natural Internet-scale image data? In this thesis, we propose a novel approach for unsupervised learning of object representations by mixing natural image scenes. Without any human help, our method mixes visually similar images to synthesize new realistic scenes using adversarial training. In this process the model learns to represent and understand the objects prevalent in natural image data and makes them available for downstream applications. For example, it enables the transfer of objects from one scene to another. Through qualitative experiments on complex image data we show the effectiveness of our method along with its limitations. Moreover, we benchmark our approach quantitatively against state-of-the-art works on the STL-10 dataset. Our proposed method demonstrates the potential that lies in learning representations directly from natural image data and reinforces it as a promising avenue for future research.
Representation Learning using Semantic Distances
Markus roth · may 2019, zero-shot learning using generative adversarial networks, hamed hemati · dec. 2018, dimensionality reduction via cnns - learning the distance between images, ioannis glampedakis · sept. 2018, learning to play othello using deep reinforcement learning and self play, thomas simon steinmann · sept. 2018, aba-j interactive multi-modality tissue sectionto-volume alignment: a brain atlasing toolkit for imagej, felix meyenhofer · march 2018, learning visual odometry with recurrent neural networks, adrian wälchli · feb. 2018.
In computer vision, Visual Odometry is the problem of recovering the camera motion from a video. It is related to Structure from Motion, the problem of reconstructing the 3D geometry from a collection of images. Decades of research in these areas have brought successful algorithms that are used in applications like autonomous navigation, motion capture, augmented reality and others. Despite the success of these prior works in real-world environments, their robustness is highly dependent on manual calibration and the magnitude of noise present in the images in form of, e.g., non-Lambertian surfaces, dynamic motion and other forms of ambiguity. This thesis explores an alternative approach to the Visual Odometry problem via Deep Learning, that is, a specific form of machine learning with artificial neural networks. It describes and focuses on the implementation of a recent work that proposes the use of Recurrent Neural Networks to learn dependencies over time due to the sequential nature of the input. Together with a convolutional neural network that extracts motion features from the input stream, the recurrent part accumulates knowledge from the past to make camera pose estimations at each point in time. An analysis on the performance of this system is carried out on real and synthetic data. The evaluation covers several ways of training the network as well as the impact and limitations of the recurrent connection for Visual Odometry.
Crime location and timing prediction
Bernard swart · jan. 2018, from cartoons to real images: an approach to unsupervised visual representation learning, simon jenni · feb. 2017, automatic and large-scale assessment of fluid in retinal oct volume, nina mujkanovic · dec. 2016, segmentation in 3d using eye-tracking technology, michele wyss · july 2016, accurate scale thresholding via logarithmic total variation prior, remo diethelm · aug. 2014, unsupervised object segmentation with generative models, adam jakub bielski · april 2024.
Advances in computer vision have transformed how we interact with technology, driven by significant breakthroughs in scalable deep learning and the availability of large datasets. These technologies now play a crucial role in various applications, from improving user experience through applications like organizing digital photo libraries, to advancing medical diagnostics and treatments. Despite these valuable applications, the creation of annotated datasets remains a significant bottleneck. It is not only costly and labor-intensive but also prone to inaccuracies and human biases. Moreover, it often requires specialized knowledge or careful handling of sensitive information. Among the tasks in computer vision, image segmentation particularly highlights these challenges, with its need for precise pixel-level annotations. This context underscores the need for unsupervised approaches in computer vision, which can leverage the large volumes of unlabeled images produced every day. This thesis introduces several novel methods for learning fully unsupervised object segmentation models using only collections of images. Unlike much prior work, our approaches are effective on complex real-world images and do not rely on any form of annotations, including pre-trained supervised networks, bounding boxes, or class labels. We identify and leverage intrinsic properties of objects – most notably, the cohesive movement of object parts – as powerful signals for driving unsupervised object segmentation. Utilizing innovative generative adversarial models, we employ this principle to either generate segmented objects or directly segment them in a manner that allows for realistic movement within scenes. Our work demonstrates how such generated data can train a segmentation model that effectively generalizes to realworld images. Furthermore, we introduce a method that, in conjunction with recent advances in self-supervised learning, achieves state-of-the-art results in unsupervised object segmentation. Our methods rely on the effectiveness of Generative Adversarial Networks, which are known to be challenging to train and exhibit mode collapse. We propose a new, more principled GAN loss, whose gradients encourage the generator model to explore missing modes in its distribution, addressing these limitations and enhancing the robustness of generative models.
Novel Techniques for Robust and Generalizable Machine Learning
Abdelhak lemkhenter · sept. 2023.
Neural networks have transcended their status of powerful proof-of-concept machine learning into the realm of a highly disruptive technology that has revolutionized many quantitative fields such as drug discovery, autonomous vehicles, and machine translation. Today, it is nearly impossible to go a single day without interacting with a neural network-powered application. From search engines to on-device photo-processing, neural networks have become the go-to solution thanks to recent advances in computational hardware and an unprecedented scale of training data. Larger and less curated datasets, typically obtained through web crawling, have greatly propelled the capabilities of neural networks forward. However, this increase in scale amplifies certain challenges associated with training such models. Beyond toy or carefully curated datasets, data in the wild is plagued with biases, imbalances, and various noisy components. Given the larger size of modern neural networks, such models run the risk of learning spurious correlations that fail to generalize beyond their training data. This thesis addresses the problem of training more robust and generalizable machine learning models across a wide range of learning paradigms for medical time series and computer vision tasks. The former is a typical example of a low signal-to-noise ratio data modality with a high degree of variability between subjects and datasets. There, we tailor the training scheme to focus on robust patterns that generalize to new subjects and ignore the noisier and subject-specific patterns. To achieve this, we first introduce a physiologically inspired unsupervised training task and then extend it by explicitly optimizing for cross-dataset generalization using meta-learning. In the context of image classification, we address the challenge of training semi-supervised models under class imbalance by designing a novel label refinement strategy with higher local sensitivity to minority class samples while preserving the global data distribution. Lastly, we introduce a new Generative Adversarial Networks training loss. Such generative models could be applied to improve the training of subsequent models in the low data regime by augmenting the dataset using generated samples. Unfortunately, GAN training relies on a delicate balance between its components, making it prone mode collapse. Our contribution consists of defining a more principled GAN loss whose gradients incentivize the generator model to seek out missing modes in its distribution. All in all, this thesis tackles the challenge of training more robust machine learning models that can generalize beyond their training data. This necessitates the development of methods specifically tailored to handle the diverse biases and spurious correlations inherent in the data. It is important to note that achieving greater generalizability in models goes beyond simply increasing the volume of data; it requires meticulous consideration of training objectives and model architecture. By tackling these challenges, this research contributes to advancing the field of machine learning and underscores the significance of thoughtful design in obtaining more resilient and versatile models.
Automated Sleep Scoring, Deep Learning and Physician Supervision
Luigi fiorillo · oct. 2022.
Sleep plays a crucial role in human well-being. Polysomnography is used in sleep medicine as a diagnostic tool, so as to objectively analyze the quality of sleep. Sleep scoring is the procedure of extracting sleep cycle information from the wholenight electrophysiological signals. The scoring is done worldwide by the sleep physicians according to the official American Academy of Sleep Medicine (AASM) scoring manual. In the last decades, a wide variety of deep learning based algorithms have been proposed to automatise the sleep scoring task. In this thesis we study the reasons why these algorithms fail to be introduced in the daily clinical routine, with the perspective of bridging the existing gap between the automatic sleep scoring models and the sleep physicians. In this light, the primary step is the design of a simplified sleep scoring architecture, also providing an estimate of the model uncertainty. Beside achieving results on par with most up-to-date scoring systems, we demonstrate the efficiency of ensemble learning based algorithms, together with label smoothing techniques, in both enhancing the performance and calibrating the simplified scoring model. We introduced an uncertainty estimate procedure, so as to identify the most challenging sleep stage predictions, and to quantify the disagreement between the predictions given by the model and the annotation given by the physicians. In this thesis we also propose a novel method to integrate the inter-scorer variability into the training procedure of a sleep scoring model. We clearly show that a deep learning model is able to encode this variability, so as to better adapt to the consensus of a group of scorers-physicians. We finally address the generalization ability of a deep learning based sleep scoring system, further studying its resilience to the sleep complexity and to the AASM scoring rules. We can state that there is no need to train the algorithm strictly following the AASM guidelines. Most importantly, using data from multiple data centers results in a better performing model compared with training on a single data cohort. The variability among different scorers and data centers needs to be taken into account, more than the variability among sleep disorders.
Learning Representations for Controllable Image Restoration
Givi meishvili · march 2022.
Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features.
Learning Generalizable Visual Patterns Without Human Supervision
Simon jenni · oct. 2021.
Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design. This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise. While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings.
Learning Interpretable Representations of Images
Attila szabó · june 2019.
Computers represent images with pixels and each pixel contains three numbers for red, green and blue colour values. These numbers are meaningless for humans and they are mostly useless when used directly with classical machine learning techniques like linear classifiers. Interpretable representations are the attributes that humans understand: the colour of the hair, viewpoint of a car or the 3D shape of the object in the scene. Many computer vision tasks can be viewed as learning interpretable representations, for example a supervised classification algorithm directly learns to represent images with their class labels. In this work we aim to learn interpretable representations (or features) indirectly with lower levels of supervision. This approach has the advantage of cost savings on dataset annotations and the flexibility of using the features for multiple follow-up tasks. We made contributions in three main areas: weakly supervised learning, unsupervised learning and 3D reconstruction. In the weakly supervised case we use image pairs as supervision. Each pair shares a common attribute and differs in a varying attribute. We propose a training method that learns to separate the attributes into separate feature vectors. These features then are used for attribute transfer and classification. We also show theoretical results on the ambiguities of the learning task and the ways to avoid degenerate solutions. We show a method for unsupervised representation learning, that separates semantically meaningful concepts. We explain and show ablation studies how the components of our proposed method work: a mixing autoencoder, a generative adversarial net and a classifier. We propose a method for learning single image 3D reconstruction. It is done using only the images, no human annotation, stereo, synthetic renderings or ground truth depth map is needed. We train a generative model that learns the 3D shape distribution and an encoder to reconstruct the 3D shape. For that we exploit the notion of image realism. It means that the 3D reconstruction of the object has to look realistic when it is rendered from different random angles. We prove the efficacy of our method from first principles.
Learning Controllable Representations for Image Synthesis
Qiyang hu · june 2019.
In this thesis, our focus is learning a controllable representation and applying the learned controllable feature representation on images synthesis, video generation, and even 3D reconstruction. We propose different methods to disentangle the feature representation in neural network and analyze the challenges in disentanglement such as reference ambiguity and shortcut problem when using the weak label. We use the disentangled feature representation to transfer attributes between images such as exchanging hairstyle between two face images. Furthermore, we study the problem of how another type of feature, sketch, works in a neural network. The sketch can provide shape and contour of an object such as the silhouette of the side-view face. We leverage the silhouette constraint to improve the 3D face reconstruction from 2D images. The sketch can also provide the moving directions of one object, thus we investigate how one can manipulate the object to follow the trajectory provided by a user sketch. We propose a method to automatically generate video clips from a single image input using the sketch as motion and trajectory guidance to animate the object in that image. We demonstrate the efficiency of our approaches on several synthetic and real datasets.
Beyond Supervised Representation Learning
Mehdi noroozi · jan. 2019.
The complexity of any information processing task is highly dependent on the space where data is represented. Unfortunately, pixel space is not appropriate for the computer vision tasks such as object classification. The traditional computer vision approaches involve a multi-stage pipeline where at first images are transformed to a feature space through a handcrafted function and then consequenced by the solution in the feature space. The challenge with this approach is the complexity of designing handcrafted functions that extract robust features. The deep learning based approaches address this issue by end-to-end training of a neural network for some tasks that lets the network to discover the appropriate representation for the training tasks automatically. It turns out that image classification task on large scale annotated datasets yields a representation transferable to other computer vision tasks. However, supervised representation learning is limited to annotations. In this thesis we study self-supervised representation learning where the goal is to alleviate these limitations by substituting the classification task with pseudo tasks where the labels come for free. We discuss self-supervised learning by solving jigsaw puzzles that uses context as supervisory signal. The rational behind this task is that the network requires to extract features about object parts and their spatial configurations to solve the jigsaw puzzles. We also discuss a method for representation learning that uses an artificial supervisory signal based on counting visual primitives. This supervisory signal is obtained from an equivariance relation. We use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. The most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. We discuss a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific finetuned model. Finally, we study the problem of multi-task representation learning. A naive approach to enhance the representation learned by a task is to train the task jointly with other tasks that capture orthogonal attributes. Having a diverse set of auxiliary tasks, imposes challenges on multi-task training from scratch. We propose a framework that allows us to combine arbitrarily different feature spaces into a single deep neural network. We reduce the auxiliary tasks to classification tasks and the multi-task learning to multi-label classification task consequently. Nevertheless, combining multiple representation space without being aware of the target task might be suboptimal. As our second contribution, we show empirically that this is indeed the case and propose to combine multiple tasks after the fine-tuning on the target task.
Motion Deblurring from a Single Image
Meiguang jin · dec. 2018.
With the information explosion, a tremendous amount photos is captured and shared via social media everyday. Technically, a photo requires a finite exposure to accumulate light from the scene. Thus, objects moving during the exposure generate motion blur in a photo. Motion blur is an image degradation that makes visual content less interpretable and is therefore often seen as a nuisance. Although motion blur can be reduced by setting a short exposure time, an insufficient amount of light has to be compensated through increasing the sensor’s sensitivity, which will inevitably bring large amount of sensor noise. Thus this motivates the necessity of removing motion blur computationally. Motion deblurring is an important problem in computer vision and it is challenging due to its ill-posed nature, which means the solution is not well defined. Mathematically, a blurry image caused by uniform motion is formed by the convolution operation between a blur kernel and a latent sharp image. Potentially there are infinite pairs of blur kernel and latent sharp image that can result in the same blurry image. Hence, some prior knowledge or regularization is required to address this problem. Even if the blur kernel is known, restoring the latent sharp image is still difficult as the high frequency information has been removed. Although we can model the uniform motion deblurring problem mathematically, it can only address the camera in-plane translational motion. Practically, motion is more complicated and can be non-uniform. Non-uniform motion blur can come from many sources, camera out-of-plane rotation, scene depth change, object motion and so on. Thus, it is more challenging to remove non-uniform motion blur. In this thesis, our focus is motion blur removal. We aim to address four challenging motion deblurring problems. We start from the noise blind image deblurring scenario where blur kernel is known but the noise level is unknown. We introduce an efficient and robust solution based on a Bayesian framework using a smooth generalization of the 0−1 loss to address this problem. Then we study the blind uniform motion deblurring scenario where both the blur kernel and the latent sharp image are unknown. We explore the relative scale ambiguity between the latent sharp image and blur kernel to address this issue. Moreover, we study the face deblurring problem and introduce a novel deep learning network architecture to solve it. We also address the general motion deblurring problem and particularly we aim at recovering a sequence of 7 frames each depicting some instantaneous motion of the objects in the scene.
Towards a Novel Paradigm in Blind Deconvolution: From Natural to Cartooned Image Statistics
Daniele perrone · july 2015.
In this thesis we study the blind deconvolution problem. Blind deconvolution consists in the estimation of a sharp image and a blur kernel from an observed blurry image. Because the blur model admits several solutions it is necessary to devise an image prior that favors the true blur kernel and sharp image. Recently it has been shown that a class of blind deconvolution formulations and image priors has the no-blur solution as global minimum. Despite this shortcoming, algorithms based on these formulations and priors can successfully solve blind deconvolution. In this thesis we show that a suitable initialization can exploit the non-convexity of the problem and yield the desired solution. Based on these conclusions, we propose a novel “vanilla” algorithm stripped of any enhancement typically used in the literature. Our algorithm, despite its simplicity, is able to compete with the top performers on several datasets. We have also investigated a remarkable behavior of a 1998 algorithm, whose formulation has the no-blur solution as global minimum: even when initialized at the no-blur solution, it converges to the correct solution. We show that this behavior is caused by an apparently insignificant implementation strategy that makes the algorithm no longer minimize the original cost functional. We also demonstrate that this strategy improves the results of our “vanilla” algorithm. Finally, we present a study of image priors for blind deconvolution. We provide experimental evidence supporting the recent belief that a good image prior is one that leads to a good blur estimate rather than being a good natural image statistical model. By focusing the attention on the blur estimation alone, we show that good blur estimates can be obtained even when using images quite different from the true sharp image. This allows using image priors, such as those leading to “cartooned” images, that avoid the no-blur solution. By using an image prior that produces “cartooned” images we achieve state-of-the-art results on different publicly available datasets. We therefore suggests a shift of paradigm in blind deconvolution: from modeling natural image statistics to modeling cartooned image statistics.
New Perspectives on Uncalibrated Photometric Stereo
Thoma papadhimitri · june 2014.
This thesis investigates the problem of 3D reconstruction of a scene from 2D images. In particular, we focus on photometric stereo which is a technique that computes the 3D geometry from at least three images taken from the same viewpoint and under different illumination conditions. When the illumination is unknown (uncalibrated photometric stereo) the problem is ambiguous: different combinations of geometry and illumination can generate the same images. First, we solve the ambiguity by exploiting the Lambertian reflectance maxima. These are points defined on curved surfaces where the normals are parallel to the light direction. Then, we propose a solution that can be computed in closed-form and thus very efficiently. Our algorithm is also very robust and yields always the same estimate regardless of the initial ambiguity. We validate our method on real world experiments and achieve state-of-art results. In this thesis we also solve for the first time the uncalibrated photometric stereo problem under the perspective projection model. We show that unlike in the orthographic case, one can uniquely reconstruct the normals of the object and the lights given only the input images and the camera calibration (focal length and image center). We also propose a very efficient algorithm which we validate on synthetic and real world experiments and show that the proposed technique is a generalization of the orthographic case. Finally, we investigate the uncalibrated photometric stereo problem in the case where the lights are distributed near the scene. In this case we propose an alternating minimization technique which converges quickly and overcomes the limitations of prior work that assumes distant illumination. We show experimentally that adopting a near-light model for real world scenes yields very accurate reconstructions.
METHODS article
Deep learning for understanding satellite imagery: an experimental survey.
- 1 Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- 2 neptune.ml, Warsaw, Poland
- 3 Warsaw University of Technology, Warsaw, Poland
- 4 deepsense.ai, Warsaw, Poland
- 5 Centre for Natural Resources Management, Analysis, Training and Policy Research (NARMA), Kathmandu, Nepal
- 6 Zurich University of Applied Sciences, Zürich, Switzerland
- 7 Anthropocene Labs, New York, NY, United States
- 8 Center for Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
- 9 International Institute of Information Technology Hyderabad, Hyderabad, India
Translating satellite imagery into maps requires intensive effort and time, especially leading to inaccurate maps of the affected regions during disaster and conflict. The combination of availability of recent datasets and advances in computer vision made through deep learning paved the way toward automated satellite image translation. To facilitate research in this direction, we introduce the Satellite Imagery Competition using a modified SpaceNet dataset. Participants had to come up with different segmentation models to detect positions of buildings on satellite images. In this work, we present five approaches based on improvements of U-Net and Mask R-Convolutional Neuronal Networks models, coupled with unique training adaptations using boosting algorithms, morphological filter, Conditional Random Fields and custom losses. The good results—as high as A P = 0.937 and A R = 0.959 —from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation.
1. Introduction
Despite substantial advances in global human well-being, the world continues to experience humanitarian crizes and natural disasters. Long-term and reignited conflicts affect people in many parts of the world, but often, accurate maps of the affected regions either do not exist or are outdated by disaster or conflict. Satellite imagery is readily available to humanitarian organizations, but translating images into maps is an intensive effort. Today, maps are produced by specialized organizations or in volunteer events such as mapathons, where imagery is annotated with roads, buildings, farms, rivers etc. In this work, we explore how machine learning can help pave the way for automated analysis of satellite imagery to generate relevant and real-time maps.
Applications of the state-of-the-art results in deep learning have been increasingly accessible to various different domains over the last few years ( LeCun et al., 2015 ), the main reasons being the advent of end-to-end approaches in deep learning ( LeCun et al., 2015 ), and the access to vast amounts of openly available data and high performance compute. The same does however not hold true for the research community interested in satellite imagery and remote sensing. While access to high-performance compute infrastructure has not been an inhibiting factor, access to high-resolution imagery still stays a major inhibiting factor to high quality AI/ML research in satellite imagery and remote sensing.
This work builds on top of a recently released open dataset, SpaceNet ( v 1) ( Spacenet on aws, 2018 ), which in partnership with Digital Globe, released raw multiband satellite imagery of (as high as) 30 cm resolution for numerous cities like Vegas, Paris, Shanghai, Khartoum, along with the corresponding annotations of buildings and roads. In this work, we focus on the problem of instance segmentation on a simplified version of the SpaceNet dataset, in order to detect buildings in different urban settings on high resolution satellite imagery. A large-scale competition was organized by the challenge platform crowdAI, which released a simplified version (details in Section 3) of the SpaceNet dataset, and attracted 55 participants and 719 submissions. In general, different architectures for image segmentation have been proposed in the past. Mask R-Convolutional Neuronal Networks (CNN) and U-Net type of architectures are currently seen as state-of-the-art for such problems. This has been further substantiated by the success of such architectures in this competition and as well for the application in satellite imagery. The top contestants all fall into these two basic categories and both show that they compete on a similar high level. Five different adaptations of U-Net and Mask-RCNN based approaches were applied in context of this problem and showed top performance in the segmentation challenge. The different improvements and results for these five approaches are outlined in this paper. The next section will review related work with a particular focus on the development of U-Net and Mask-RCNN types of architectures. This will be followed by a brief description of the used dataset and the applied evaluation metrics. Afterwards, the different methods will be explained and presented together with accompanying results and we will analyze the effect of the depth of the U-Net structure on results. A brief section will provide a comparison of the approaches followed by the conclusion.
2. Related Work
Semantic segmentation deals with the task of assigning each pixel in a given image to one of potentially multiple classes. It deals with recognizing which objects are shown and where exactly these are presented in the image. As such, it is a challenging task that requires, on the one hand, to take into account the overall context of the image and for each pixel that of the surrounding area. On the other hand, it is required to label each pixel individually focusing on a very fine level of detail. While approaches to semantic segmentation have been around for a long time (see review on more traditional approaches in Thoma (2016) , or for example, He et al. (2004) , Shotton et al. (2009) ), the recent success of Deep Neural Networks in image related tasks ( Krizhevsky et al., 2012 ) has translated as well to the area of semantic segmentation. Deep Neural Networks and in particular Convolutional Neuronal Networks have revolutionized the area of image classification during the last decade and are now the dominant approach for image classification leading to deeper and deeper architectures ( He et al., 2016 ). This became possible through algorithmic advances—as using rectified-linear units that avoid vanishing of the gradient during training ( Krizhevsky et al., 2012 )—, as well as implementing convolutional and pooling layers that had originally been proposed long before ( Fukushima, 1980 ). Such approaches deal with the question of what is shown in a given image. Using convolutional filters—that only focus on small portions of the image and are moved over the whole image—allows to learn subsequently more and more abstract structures and invariances in images. Learning becomes efficient through weight sharing and the whole network can be trained in an end-to-end fashion. Together with pooling layers, the focus and receptive field of each deeper layer successively broadens until a very coarse latent space summarizes input from large portions of the image and can be used for classification. This step-by-step abstraction helps to resolve invariances as translations of objects and supports classification. A drawback, however, is that this abstraction looses resolution and fine details of structure as needed in semantic segmentation.
While classification addresses what is shown in an image, semantic segmentation in addition deals with where exactly something is shown in the image. None-the-less, the introduction of Deep Learning techniques into semantic segmentation improved dramatically segmentation accuracy and therefore became the predominant approach in this area as well. This further promoted the area and the increasingly better results lead to broad application of approaches in commercial products.
In the following, we will review prominent developments on semantic segmentation using Deep Neural Networks. In particular, the focus will be on U-Net like approaches employing forms of convolutions together with deconvolution or upsampling as well as Mask R-CNN because in the described challenge these kinds of approaches showed to be the best performing ones. For a broader overview: There are different surveys and reviews on the current state of semantic segmentation. Lateef and Ruichek (2019) provide a systematic and exhaustive review of different categories of approaches employing Deep Learning techniques and presenting available benchmarks and datasets as well as evaluation criteria. Hao et al. (2020) put a different focus on the degree of supervision during training. More traditional approaches are summarized in Thoma (2016) and recent advances are briefly addressed in Atif et al. (2019) and Minaee et al. (2020) .
A crucial first architecture was given by the Fully Convolution Network (FCN) ( Long et al., 2015 ) that can be applied to images of any dimension. In general, it is using a convolutional network architecture for the first layers: blocks of convolution and max pooling layers are applied in sequence until the image is downsized to 1 / 32 th of the original dimensions. While in classification afterward fully-connected layers would be utilized on this latent space, in FCNs class predictions are made on this level of detail for the different small clusters. Afterwards, the assigned labels are scaled up to its original size using a sequence of up sampling and deconvolutional layers. While the down stream is collecting contextual information in larger areas and for a coarse resolution, the up stream is tasked with reconstructing more detailed spatial information. This architecture lead to nice improvements on the PASCAL VOC dataset in 2012 and has in particular the advantage that it can be trained in an end-to-end fashion without requiring selection of features or tuning of these. Architectures of such a type employing convolutional layers are today the standard approach for semantic segmentation tasks ( Lateef and Ruichek, 2019 ).
One disadvantage, that was found in the early approaches using DNNs for semantic segmentation, is that detailed structure tends to get lost and fine structures in images appear washed-out. FCN addressed this, on the one hand, by not using a very deep architecture, which would otherwise lead to overly large receptive fields, and, on the other hand, for the last steps they already introduced skip connections. Skip connections provide information from earlier layers in the processing sequence that operate on a more fine grained resolution. The outputs of these previous layers are used as an additional input to the later stage that is not only getting information from the directly preceding layer, but as well as the rerouted information from an earlier layer through the skip connections. The weights of these connections are adapted during training as well. This idea of skip connections has been further refined in U-Net type architectures and has in general be found to be quite effective (e.g., see Chen et al., 2018 ) which used short cut connections that enhanced the results). U-Net ( Ronneberger et al., 2015 ) is an improvement of FCN and constituted of a symmetric arrangement of a contractive and an expansive path. Following a general trend toward smaller convolutional filters, the contractive path consists of a sequence of two 3 × 3 convolutions that is followed by a two-by-two max pooling layer. The expansive path is symmetric, but up-convolutions replace the max pooling layers. Importantly, corresponding layers of both paths are connected by skip connections (see Figure 5 ). These provide detailed information for the upscaling layer that has the same resolution as required for the output of that layer in the expansive path. U-Net provides a simple architecture that has become very popular as it can be implemented quite efficiently and the introduction of local information on every level of detail lead to much improved results. Many further architectures were built following a general U-Net or encoder-decoder like structure. For example, exchanging the different blocks of processing (convolutions and pooling layers) with refined and further improved blocks. Drozdzal et al. (2016) introduced residual blocks that added further skip connections inside each block as residual connections. This, in general, allows for deeper networks and better training which showed in their results as well. As a further step, Jégou et al. (2017) applied two dense blocks in each of the streams that both consist of multiple stacked layers—of convolutions—that are connected by residual connections and the information from all the layers inside that block is aggregated through skip-like connections at the output of a block. As a result, the output of each block contains low level as well as high level features at different resolution. This provided further state-of-the-art results.
In general, downsampling in the down stream—the concatenation of convolutions and pooling operation—aims at increasing the receptive field and taking more context into account as required for classification. But this increase comes with a reduced spatial resolution. As an alternative, dilated convolutions ( Yu and Koltun, 2015 ) as well increase the receptive field without reducing spatial resolution. In dilated (or atrous) convolutions, a convolutional filter is build, but in this case the subsequent entries of the filter are not applied to subsequent entries in the input, but only every l -th entry of the input is processed with l being the dilation factor that represents space between entries in a filter. This increases the size of the receptive field dramatically (over multiple layers it increases exponentially) and still can be implemented reasonably efficient as sparse convolutions. As a result, such layers allow to derive contextual information at multiple scales without losing resolution ( Yu and Koltun, 2015 ). Already the initial approach showed state-of-the-art performance. As one disadvantage, dilated convolutions tend to produce gridding artifacts that stem from the systematic structure of the constructed filters. Further improvements used spatial pyramid pooling modules, as for example in DeepLab ( Chen et al., 2018a ) or DeepLabv3+ in which this is complemented by a simple decoder module ( Chen et al., 2018b ). As an alternative, recently, FastFCN was developed ( Wu et al., 2019 ). As processing dilated convolutions requires quite some memory and time, this approach started from FCN including recent improvements, but ultimately replaced dilated convolutions in the expansive path by Joint Pyramid Upsampling. This showed to be more efficient and still produced good results.
While one advantage of the earlier proposed deep architectures was the possibility to train these in an end-to-end fashion, other approaches used additional pre- and post-processing. DeepLab ( Chen et al., 2018a ) applied Conditional Random Fields (CRF) ( Krähenbühl and Koltun, 2012 ) in a post-processing step which in their case produced better outlines of objects in semantic segmentation. CRFs had been applied in the past as a post-processing step that takes contextual information nicely into account and leads to more coherent labels. CRFs have been tested in one approach in the here described challenge, but it was found that such an explicit step can become unnecessary and appeared not helpful when sufficient detail was already trained into the Deep Neural Network (which is in agreement with other findings). Others have successfully integrated CRFs ( Zheng et al., 2015 ) into training of the whole system as these tend to produce quite good results close to object boundaries.
Regional proposal based methods follow a different type of approach that has shown success in the past as well as in the results presented in this paper. Faster R-CNN ( Ren et al., 2015 ) and Mask R-CNN ( He et al., 2017 ) are examples of this type of architecture. These kind of approaches consist of multiple stages. First, regions of the input image are identified and bounding boxes for possible objects are proposed. In Faster R-CNN ( Ren et al., 2015 ) a region proposal network was introduced as a form of a fully convolutional network. Secondly, features are extracted for each of these bounding boxes. In Faster R-CNN this was realized quite efficiently as both stages can share features that are detected using convolutional layers. In Mask R-CNN a third step is applied, in which—in the same way as in a FCN—the extracted features are not used for classification for the object in that bounding box, but instead are used for detailed pixel-wise prediction of class labels.
For a more detailed overview see Lateef and Ruichek (2019) and Minaee et al. (2020) .
The dataset used in this work was derived from the SpaceNet dataset ( Spacenet on aws, 2018 ). It provides a good dataset for comparing learning approaches on remote sensing data (for a comparable dataset see Castillo-Navarro et al. (2019) ). Instead of considering all the channels in the multiband imagery from the SpaceNet dataset, we only focus on the RGB channels (for an example of an approach exploiting as well spectral information see Ben Hamida et al. (2017) ). The decision to exclude information from non-RGB channels helps create an alternate version of the SpaceNet dataset, which makes the problem easy and accessible to researchers in Deep Learning, who may or may not be very familiar with the tools used by the Remote Sensing community to manipulate the multiband imagery, and are usually more familiar with simple RGB images which are extensively utilized in Deep Learning research. Moreover, when considering only the RGB channels, the problem becomes a direct parallel of very popular instance segmentation tasks commonly studied in Deep Learning research. At the same time, given the flexibility of most of the approaches in Deep Learning, if we demonstrate that we can get good results using just the RGB channels, extending the same approach to a multi channel signal provides us with better results. The dataset consists of a training set of 280,741 images, validation set of 60,317 images and test set of 60,697 images. a (See Fig. 1)
4. Evaluation Metrics
The evaluation was principally based on the Intersection of Union ( I o U ) between the predicted mask and the ground truth.
For a known ground truth mask A , a predicted mask B , we first compute I o U (Intersection Over Union):
I o U measures the overall overlap between the true region and the proposed region.
Then we consider a True detection, when there is at least half an overlap (or I o U ≥ 0.5 ).
We can then define the following parameters (TP - true positive prediction, FP - false positive, FN - false negative):
• Precision ( I o U ≥ 0.5 )
• Recall ( I o U ≥ 0.5 )
The final scoring parameters
(average precision) and
(average recall) are computed by averaging over all the precision and recall values for all known annotations in the ground truth.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g001.jpg)
FIGURE 1 . Sample images from the Mapping Challenge Dataset showing the top-down view of satellite imagery.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g002.jpg)
FIGURE 2 . Schematic overview of the U-Net pipeline used in the first approach (Section 5). Nodes denote computational steps, arrows denote data flow. Please refer to Table 1 for description of each node.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g003.jpg)
FIGURE 3 . Each row represents a randomly sampled tile from the dataset. The 1 st column shows the input RGB image. The 2 nd column shows the ground truth mask. The 3 rd column visualizes the pixel-wise weight computed from the (inverse of) distance of a pixel to the two nearest buildings; high values corresponds to pixels between nearby buildings. The 4 th column visualizes the pixel-wize weight computed from the (inverse of) size of an instance; high values denote small buildings (the smaller the building the darker the color). Note, that the background is fixed to black for both the weight component visualizations.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-t001.jpg)
TABLE 1 . Experiment results for the U-Net adaptation using Fast Learnings for Fast Mapping (Section 6).
5. Instance Segmentation Using Customized U-Net
As a first approach, we propose a two stage solution. The first stage is a neural network stage based on a U-Net ( Ronneberger et al., 2015 ) followed by a post-processing stage using gradient boosting ( Ke et al., 2017 ). Figure 2 presents the entire U-Net based pipeline, which takes raw RGB images along with some previously calculated meta-data as input, and predicts the instance segmentation masks.
(1) Customized U-Net based Neural Network: Inspired by Iglovikov et al. 2018 , we experimented with U-Net with Resnet34, Resnet101 ( He et al., 2016 ) and Resnet152 as an encoder, and the best results were obtained in the case of Resnet101.
(2) Loss Design: From the initial experiments, we quickly recognized the importance of closely placed buildings, and the tendency of the initial networks to group closely placed buildings as a single instance. The models also struggled with small instances (buildings with a small area in the image), but are equally important for the final evaluation metric (as the metric treats all instances equally when computing the Average Precision and Recall). There were many such cases, where an instance annotation was represented by barely a few pixels, as these were annotations from a building which was overflowing from the adjacent tile, and had only a small part of the building visible (and annotated) in the current tile. In order to be robust to these issues, we have designed a custom loss function by introducing two weighting factors (see Figure 3 for the visualization of weighting factors). The first factor puts a higher weight on pixels that are close to other buildings, and second factor puts a higher weight to pixels that belong to small objects. Equation 1 represents the loss function used in this approach.Let:
• x predictions from U-Net,
• y be ground truth,
• L o s s c e be Cross Entropy Loss,
• L o s s d i c e be Dice Loss,
• W c e be weight assigned to Cross Entropy Loss,
• W d i c e be weight assigned to Dice Loss,
• W d be distance (to the two closest instances) weights,
• W s be size weights.
Then W = W d ⋅ W s is pixel weight and loss function has following definition:
It is the sum of two losses, Cross Entropy Loss and Dice Loss, each weighted by real number picked form ℝ [ 0,1 ] . The Cross Entropy Loss component is additionally weighted with the pixel weight (computed as a dot product of both the distance weight and the size weight) to penalize mis-classifications on pixels belonging to the small objects and closely located instances. This lets us jointly optimize the models ability to distinguish between two closely located buildings, and also the model’s ability to segment out smaller instances. Figure 3 shows a visualization of the individual pixel weight components used in the custom loss function. These custom adaptations to the Loss Functions significantly improved the performance of our model.
(3) Training Scheme: The following multi-stage training scheme along with pre-trained models (as available in PyTorch b ) as starting points is used for better results (for both Average Precision and Average Recall ):
(1) Initialize the model with pre-trained weights,
(2) Train on a 50,000 tile subset from the training set with learning rate = 10 − 4 and dice weight = 0.5,
(3) Train on the full dataset with learning rate = 10 − 4 and dice weight = 0.5,
(4) Train with a (reduced) learning rate = 10 − 5 and dice weight = 0.5,
(5)Train with 10 fold increase in the dice weight (5.0) to make the final predictions smoother.
(4) Pre Processing
• For each pixel: compute distances to the two closest instances are calculated to create the distance map that is used for weighing the loss function.
• Size mask for each image is produced, that encoding the information about object size.
• small masks on the edges of the image were dropped.
(5) Post Processing:
• Test time augmentation: Made predictions on image rotations (90–180–270°) and flips (up-down, left-right) and use the geometric mean of the predictions as the final result.
• Second level model . We finally used Gradient Boosting to train a separate model using Light-GBM on the first stage output for computing the final prediction masks.
Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.938, and a A R I o U ≥ 0.5 of 0.946. Figure 4 , shows examples of some predictions made by the trained model.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g004.jpg)
FIGURE 4 . Each row represents a randomly sampled tile from the validation set; the 1 st column shows the input RGB image; the 2 nd column shows the model’ ™ s prediction; and the 3 rd column is constructed as the model’ ™ s prediction superimposed on the input image; finally the 4 th column shows the ground truth superimposed on the input image.
6. Fast Learnings for Fast Mapping Using U-Net
In this section, we approach the problem by reframing it as a binary semantic segmentation task. To summarize, we preprocessed ground truth labels into binary masks, trained with a 1-cycle learning rate schedule on a U-Net ( Ronneberger et al., 2015 ) inspired architecture with an ImageNet-pretrained ResNeXt50 ( Xie et al., 2016 ) encoder, and post-processed the probability maps into polygonized building footprint instances.
We recognized upfront that the main evaluation metric being average precision at 0.5 IoU meant that “good enough” predictions of masks that overlap each true building by greater than 50% was more important than obtaining the highest possible pixel-wise accuracy. We also kept in mind that buildings are at different scales and the smallest or portions of buildings like those cut off at the borders of an image chip would be most challenging to perform well on. These considerations factored into our decision to use a U-Net architecture with a custom loss function that works well at multiple scales and balances pixel-wise with global IoU cost optimization.
In addition to the overall objectives, we also had the goal of refining our model development process to work best with fast, lightweight models and rapid experimentation on small datasets. These criteria are equally important to on-the-field planet-monitoring work where models need to be versatile in production and robust to highly diverse datasets and use cases.
We learned quickly that experiments on the full dataset would take 1 day per experiment due to the large size of the training (260k images) and test set (60k images). Therefore, our experimentation process emphasized doing many rapid and comparable iterations using downsized and small samples of the full dataset to minimize training time per experiment. We used a smaller sample set of the data (6k train, 1.5k val, 1.8k test) and confirmed that training and local evaluation on this sample set correlated well with performance on the full train and test sets. Using this smaller sample reduced training time from 2–3 h/epoch to 3–5 min/epoch. While the 40× speed-up helped significantly, a single experiment could still take 1.5 h or more (training to 30 epochs).
Further reducing time per experiment at the expense of lower accuracy, we downsized images to 64 × 64, 128 × 128, and 256 × 256 and benchmarked performance at each size (i.e., number of epochs to reach a certain loss/metric, best score at end of training, epochs to converge). With downsized, smaller samples and their corresponding benchmarks, we could test new ideas as quickly as 7 min per experiment (64 × 64 images trained for 30 epochs) and consistently compare new experimental results against each other. The most promising experimental settings were then benchmarked on the full training data and evaluated against the held out test set. We also visually inspected predictions against ground truth regularly and made qualitative notes about common failure cases.
For experimentation, we used one remote GPU instance (Nvidia Quadro P6000 with 24 GB of GPU memory), PyTorch 0.3 with the Fast.ai library, and Anaconda Jupyter notebooks to run experiments and document results.
6.1. Pre Processing
Pre Processing training data consisted of reflect-padding images to 320 × 320 which helped increase the visible area of buildings cut off at the sides and corners of each tile. Polygon ground truth labels were converted to binary pixel masks without any other modifications. The data was augmented with random vertical/horizontal flips, 90 ± 4° rotations, and slight image brightness and contrast changes.
Seen in Figure 5 , the model architecture used is U-Net inspired with an ImageNet-pretrained ResNeXt50 encoder (weights from the PyTorch/Fast.ai library). Key characteristics include extracting features at the end of each pretrained ResNeXt block, convolutions within the cross-connections at each feature map size (160, 80, 40, 20, 10) which halves the number of channels, and upsampling using Transpose2D (deconvolution).
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g005.jpg)
FIGURE 5 . Architecture sketch for the U-Net inspired model using a ResNeXt50 encoder as described in Section 6.
6.2. Training
The loss function is an equally weighted combination of binary cross-entropy loss and soft dice loss which empirically produced better results than either loss alone.
We trained the model with 1-cycle learning rate and cyclical learning rate schedules which was first introduced in Leslie Smith’s research on faster neural network training with very high learning rates ( Leslie and Topin, 2017 ). In recent practical experience, 1-cycle learning was successfully employed by the Fast.ai team in Stanford’™s DAWNBench competition ( Stanford DAWNBench, 2018 ) to achieve the fastest and cheapest methods in training CIFAR-10 and ImageNet classification models to performance benchmarks ( Howard, 2018 ). 1-cycle training functionality is implemented directly in Fast.ai library and its experimental usage is well documented by the team ( Gugger, 2017b ).
With a batch size of 32 and an optimizer of Stochastic Gradient Descent with momentum, we first warmed up the model by training the un-pretrained decoder layers for 1 epoch at a learning rate of 6. Then we unfreezed all weights and started 1-cycle training for 20 epochs as seen in Figure 6 with a learning rate of 0.15, linearly increased it to 6 by 45% through training, linearly decreased it back to 0.15 by 90% through training, and decayed the learning rate to 0.0015 in the last 10% of training. Momentum was scaled inversely to learning rate changes over the same schedule. Figures 7 and 8
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g006.jpg)
FIGURE 6 . The 1-cycle schedule for learning rate and momentum over training time (as described in Section 6).
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g007.jpg)
FIGURE 7 . Learning curves from experiments for the U-Net adaptation using Fast Learnings for Fast Mapping (as described in Section 6).
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g008.jpg)
FIGURE 8 . Example outputs from probability map to polygonized prediction compared with the ground truth (Fast Learnings for Fast Mapping, Section 6).
After 20 epochs with this 1-cycle learning schedule, we trained for two more 10-epoch cycles with a cyclical learning rate schedule (lr = 0.05 to 1 back to 0.05 over each cycle).
Other key model training techniques employed (available out-of-box through Fast.ai library) include gradient clipping to minimize risk of gradient explosion, a learning rate finder utility to select the highest possible learning rate without divergence up front ( Smith, 2017 ; Gugger, 2017a ) and discriminative fine-tuning ( Howard and Ruder, 2018 ) to train earlier layers at smaller learning rates than later layers.
6.3. Post Processing
For inference, we performed 8× test-time augmentation (every possible flip and 90° rotation) and took the geometric mean of all eight outputs to create the probability map. We used a probability threshold to create binary masks, labeled each separated mask as building footprint instances, and converted to polygon submission format. We filtered out very small areas (less than 15 pixel 2 ) during polygonizing to reduce false positives. We calculated the confidence score of each building instance as the average pixel-wise probability value over the area of each instance.
As reported in Table 2 , our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.918, and a A R I o U ≥ 0.5 of 0.929, with a model trained for 40 epochs total: 1-cycle for 20 epochs, 2 × 10 epochs with cyclical learning rate. After just 1-cycle training for 20 epochs, our average precision/recall was already close to best at 0.917 and 0.922. It is worth considering if the extra 20 epochs is worth the performance gain or if a single 1-cycle schedule should be used for 40 epochs instead to achieve even better results.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-t002.jpg)
TABLE 2 . An overview of experimental results: Shown is performance of the different architectures when using different loss functions during training (for the adapted U-Net architecture, Section 8).
Toward our external objective to develop fast, lightweight models that achieve top-5 performance, our model training time was 1.6 days (1.9 h/epoch × 20 epochs) and an additional 1.6 days for the extra 20 epochs that may not have been necessary. Inference took 0.5 h per test time augmentation run to go through full held out test set.
7. Instance Segmentation Using Deeper U-Nets
0U-Net, as also referenced in Section 5, is an encoder-decoder network for semantic segmentation, which has its origins in medical image segmentation. The model generates a mask for the whole image. This mask then needs to be split into individual sub-masks for the separate buildings.
The model architecture used in this approach, is shown in Figure 9 . The left half of the network (encoder) is similar to a CNN, tasked with coming up with a low dimensional dense representation of the input, and the right side (decoder) then up-samples the learned feature representations to the same shape as the input. The shortcut connections let information flow from the encoder to the decoder and help the network keeping spatial information. As the work of Li et al. (2017) has impressively shown, U-Nets benefit greatly from a deeper model architecture. It allows the model to make much more detailed segmentations. Particularly near the object borders the predictions of a deeper U-Net tend to be more accurate. We used a deep U-Net architecture which was first proposed by Giannakopoulos (2017) .
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g009.jpg)
FIGURE 9 . Architecture of the Deeper U-Net model (described in Section 7).
The inputs are zero padded to a size of 512 × 512 in order to have an image size where 2 × 2 Max Pooling can be performed multiple times without having to deal with odd image sizes. Symmetric padding would have been another option instead of zero padding. The training images are crops of a larger satellite image. As mentioned in the previous sections, the smallest, and the most difficult objects to predict are the ones which are overflowing objects from an adjacent tile. A symmetric padding would increase the area of those buildings.
For the loss function, dice-loss is combined with binary cross entropy.
The model was trained from scratch for 215 epochs using RMSprop with a learning rate of 0.0001. The training took roughly 5.5 h per epoch on a single Nvidia TITAN Xp GPU. Because of the long training duration, no image augmentation was used. Interestingly the model did not start to overfit, even though no dropout is used. This is likely due to the large training size. The learning curves are displayed in Figure 10 . Notice that the IoU shown in this figure, is calculated on the pixel level and therefore, not the same as IoU in the overall evaluation metric.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g010.jpg)
FIGURE 10 . Learning curves showing loss and IoU over training time for the Deeper U-Net model (Section 7).
The model produces a probability estimate for every pixel for being an object of interest (building). A threshold of 0.5 was used, and pixel probabilities greater than 0.5 would classify a pixel as a building. The produced masks were already well separated and were easily transformed to single building masks by giving groups of pixels connected with other groups of pixels a different label. After this step masks with less than 25 pixels were deleted. This removes little artifacts that should not be counted as buildings. A good drop off pixel threshold was empirically computed by looking at the distribution of the areas of the small objects in the dataset. Then we calculate the bounding boxes for every building mask. The building masks and the bounding boxes finally form the final predictions.
Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.930, and a A R I o U ≥ 0.5 of 0.956.
Figure 11 shows a prediction using the Deeper U-Nets, for a sample image from the validation set.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g011.jpg)
FIGURE 11 . Comparing constructed masks, drawn by a human (A) and predicted by the Deeper U-Net model (B) .
The generated masks are very accurate and it is expected, that the model would also perform well on other categories such as roads, trees, crops, rivers and lakes. This could make todays Mapathons, where volunteers draw maps from satellite images, completely obsolete. Instead of days or even weeks, maps could be generated in just a few hours from satellite imagery or drone footage. This is crucial for emergency preparedness actors who go to remote areas where no maps exist. Up-to-date maps help them to work efficiently in a crisis situation such as an earthquake.
As the work of Li et al. (2017) has impressively shown, U-Nets benefit greatly from a deeper model architecture. It allows the model to make much more detailed segmentations. Particularly near the object borders the predictions of a deeper U-Net tend to be more accurate.
8. Comparing an Adapted U-Net Architecture for Varying Depths
In this section, we are analyzing and comparing a U-Net like structure ( Ronneberger et al., 2015 ) for different depths. The architecture was derived originally from a convolutional AutoEncoder structure as used for reconstructing images (see, for an example the keras tutorial Chollet (2015) ). This AutoEncoder-type architecture was modified for semantic segmentation: the provided ground truth annotations of the buildings were used as targets for training in order to accomplish the detection of buildings. Furthermore, skip-connections were introduced as found in U-Net that connect encoding and decoding blocks on the same level. These connections help to recover spatial information and in our experiments this provided better reconstruction of details in images compared to post-processing, for example, using conditional random fields ( Krähenbühl and Koltun, 2012 ). This architecture differs from U-Net, first, with respect to the sequence inside the decoding blocks. Following the AutoEncoder approach, the decoder block mimics exactly the encoder block and consists of a single convolution followed by upsampling (max-pooling is used in the encoder block). Second, we used a single convolution of size 5 × 5 . This architecture was used for different depth (stacked blocks of encoders and decoders).
In the first part, we speak about the applied pre- and post-processing of the data-set. Secondly, we introduce our designed network architecture in Section 8.3 and explain our training procedure. In Section 8.5, results are presented that compare variations of our architecture. The results are then summarized and discussed in Section 10.
8.1. Pre-processing
We normalize each input image individually by subtracting its mean and dividing by its standard deviation.
For early tests we resized the images down from 300 × 300 pixels to 128 × 128 pixels, in order to avoid long training times. For our final models, we ultimately used the full resolution so as not to lose any detail.
8.2. Post-processing
For each of the 300 × 300 pixels in an input image, the networks yields a pseudo-probability between 0 and 1, where low values correspond to background (i.e., no building) and high values correspond to foreground (i.e., building).
We binarized these values by setting a threshold θ and assigning 1 if the value is bigger than the threshold θ and 0 otherwise.
To address noise in the background we perform a morphological opening (as provided by OpenCV Bradski, 2000 ) with a radius of 1, that is an erosion (which chooses the minimum value of a neighborhood) followed by a dilation (which chooses the maximum value). This also helps separate closely connected buildings.
In the initial experiments, we tried to apply CRF for post-processing ( Krähenbühl and Koltun, 2012 ), as those have been successfully used for semantic segmentation before. In CRF, the output values are made dependent directly on characteristics in the input image, such as edges. Efficient inference on fully-connected CRF models leads to much finer structure in the output. While in many applications this allows recognition of small details, it did not help us in our use case of detecting buildings in aerial images. In fact, the results were worse when applying CRF. After looking closely at the dataset, we noticed that the desired output maps do not contain particularly fine structures, but are instead fairly regular and in most cases contain rectangular buildings. Aerial images, by contrast, contain fine details (e.g., created by shadows), which a CRF-based post-processing tries to integrate. This seemed to damage the overall performance in our case. Instead, for post-processing, it proved to be more important to establish the general structure of buildings, which meant filling in small holes inside of larger areas that have been recognized as a building and which might appear as a result, for example, from occlusion.
8.3. Network Architecture
Our network consists of a series of encoding blocks, followed by as many decoding blocks (see Figure 12 ). We tried networks with 6, 8, 10, and 12, which we refer to as Net6, Net8, Net10, and Net12, respectively. The deeper the network, i.e. the more blocks it contains, the better it performs. For this particular work, our best evaluated results were from Net12.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g012.jpg)
FIGURE 12 . Figure 15 : Schematic representation of our U-Net-based network architecture (Net10): A sequence of encoding blocks (in blue) on the left and a corresponding sequence of decoding blocks (in green) on the right, with skip connections (dashed arrows) between them. Inside each block, we indicate the size of the internal representation at that stage. We show the setup of all encoding and decoding blocks at the bottom. (Comparing an adapted U-Net architecture for Varying Depths).
The composition of individual blocks always follows the same structure:
• Encoding blocks consist of a 5 × 5 convolution layer (padded with a stride of 1 × 1 ) with an increasing number of 64, 128, 192, 256, and 320 filters, followed by a Rectified Linear Unit. The resulting features are then normalized using batch normalization ( Ioffe and Szegedy, 2015 ), and downsampled by a 2 × 2 max pooling operation.
• Decoding blocks are symmetric to the encoding blocks, also using a 5 × 5 convolution (padded with a stride of 1 × 1 ), followed by a Rectified Linear Unit, batch normalization and a 2 × 2 upsampling operation, where the low dimensional features of the previous layer are resized. Additionally, a dropout layer ( Srivastava et al., 2014 ), omitting 30 % of the neurons, is added in the decoding blocks.
Additionally, we integrate skip connections ( Ronneberger et al., 2015 ). For this, the outputs of corresponding (see Figure 12 ) encoding and decoding blocks are concatenated and fed as inputs to the next block. The introduction of skip connections improved results greatly, even during our initial experiments with the downsampled data. In particular, skip connections helped bringing out more detailed structures of buildings.
Our approach is similar to Segnet ( Badrinarayanan et al., 2015 ) which is also using block-wise encoders and decoders, as well as upsampling on the decoder side. In contrast to our approach, they are always applying multiple convolutions, but of a smaller size. In addition, during decoding, the convolutions are applied before the upsampling step. Furthermore, in SegNet, the indices of the max-pooling layer are used during upsampling, while in our approach, information flows using the skip connections, much like U-Net.
As the overall goal was the detection of buildings within the input images, we set up the network as a binary classifier. Therefore, after the last convolution, we use a sigmoid activation function, to obtain pixel-wise pseudo-probabilities, as discussed above.
In order to match input and output size after downsampling and upsampling, we use zero padding and cropping padding when necessary.
8.4. Training
For the implementation and training of our approach, we used keras ( Chollet, 2015 ) while using tensorflow ( Abadi et al., 2016 ) as the backend framework for training our models on multiple GPUs. The same procedure was followed to train different network configurations Net6, Net8, Net10, and Net12 that differ in depth. The task for all the variations of the architecture was to map the 3-channel training data to the corresponding 1-channel ground truth annotation. We adopted the ADAM ( Kingma and Ba, 2014 ) optimizer with the default settings: beta_1 of value 0.9 and beta_2 of value 0.999 using mini-batch gradient descent. For the network configuration Net6, Net8, and Net10, the initial learning rate was set to α = 10 − 3 . At every training-step t , the learning rate was decayed according to α ← α ⋅ ( 1 + δ α ⋅ t ) − 1 with a decay rate of δ α = 5 × 10 − 5 . For Net12 the learning rate was also set to an initial value of α = 10 − 3 , but no decay was applied. The batch size was varied as 32 or 64 based on the network configuration with the maximum utilization of 2 × NVIDIA Tesla p100 or 2 × GeForce GTX 1080 Ti GPUs respectively. The replicated network’s weights were merged on the local CPU on end of each epoch.
For the first 50 epochs all networks were trained using the binary cross-entropy . For further refinement, the networks were trained for 10 additional epochs with the dice coefficient loss ( Milletari et al., 2016 ).
Using the setup that was described in this section, the average training time for each network Net6, Net8, Net10, and Net12 took about 2.2 days.
As we considered this challenge as a binary classification problem (buildings and non-building classes), the binary cross-entropy will be defined as
where y t is the target and y p the predicted class.
8.5. Experiments and Results
The four in this section presented network architectures of varying depths were trained on the provided dataset according to the training procedure described in Section 8.4. To find the optimal parameters, the threshold ( θ ) value was varied from 0.2 to 0.95 with respective squared shape morphology kernel size ( k ) between 0 and 5. Even though the pixel-wise accuracy was high because of the dominating background class, the mis-classified pixels lead to decrease in precision and recall.
An overview of the final results are reported in Table 3 with morphology for dilation and erosion set to 1. In the table, evaluation results are shown as scored on the official test data set for all networks of varying depths.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-t003.jpg)
TABLE 3 . Precision and recall per epoch at detection (NMS THRESHOLD = 0.5 for the full validation set).
The network configurations Net6, Net8, Net10, and Net12 are evaluated with θ = 0.5. The listed results in Table 3 show that the precision of the network was improved when the number of used encoding and decoding blocks is increased. This might indicate that a further improvement of the precision in the detection of buildings can be achieved by again enlarging the network architecture with more blocks.
The additional training with the dice coefficient did only slightly enhance the performance of the networks Net6, Net8 and Net12, but not in the case of Net10.
As reported in Table 3 , our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.912, and a A R I o U ≥ 0.5 of 0.942.
9. Instance Segmentation using Mask R-Convolutional Neuronal Networks
In this section, we explore the use of Mask R-CNN, a two-stage object detection architecture to detect a single class (buildings). Here, we consider Mask R-CNN in contrast to U-Net based approaches, as an I o U threshold of 0.5, helps to not warrant the high semantic accuracy that U-Net based pixel classification approaches provide. On the other hand single stage object detection architectures such as SSD ( Liu et al., 2016 ), simply learn bounding box regression and its class probabilities. They have faster inference times, but Mask R-CNN has consistently shown better accuracies, and includes semantic output.
Mask R-CNN builds up on the Regional Proposal Networks as proposed in Faster RCNN ( Ren et al., 2015 ). This first stage proposal network (illustrated in Figure 13 ) selects the regions of interest from a pre-determined set of anchors, and feature-maps from a bottom-up Resnet-101 backbone ( He et al., 2016 ), and a top-down feature pyramid network ( Lin et al., 2017 ). It matches Anchors of different sizes and aspect ratios, computed from a set of scales, to objects in an image. For implementing the modifications upon Mask R-CNN, we start off with open source implementation of Mask R-CNN (Matterport, Abdulla, 2017 ).
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g013.jpg)
FIGURE 13 . Mask R-CNN stages with modifications introduced for the Mapping challenge (see Section 9 for detailed explanation of architecture and modifications).
For the task of instance segmentation on satellite imagery, we considered anchor scales of 8, 16, 32, 64, 128. We considered smaller anchor sizes as the dataset had a significantly higher distribution of “small” instances, and at the same time the input images had a maximum size of 300 × 300 pixels. The distribution of instances in the training and the validation set includes 60% of the instances were medium sized instances (area between 1,024 and 9,216 pixel 2 ), and 37% instances were small instances (area less than 1,024 pixel 2 ). 19% of the total annotations had an area less than 256 pixel 2 (in many cases = 16%, because of tile borders intersecting the edges of the buildings). The changed anchor size ensures that the regional proposal network appropriates anchors suitable, specifically for small objects. A montage containing the generated anchors, refined anchors with small deltas, ground truth, and prediction for an image is shown in Figure 14 .
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g014.jpg)
FIGURE 14 . Intermediate anchor generation from region proposals and final mask predictions from the detection stage for the Mask R-CNN approach.
The second stage of detection in Mask R-CNN, RoIAlign, a novel feature introduced in Mask R-CNN aligns the feature-maps from first stage to the input image, resulting in better localization of output masks.
Mask R-CNN implementation creates a placeholder for the ground-truth masks that can consume a large amount of memory. Up-sampling the image, will create a placeholder for masks with the up-sampled dimension. Instead changing the stride as mentioned, means lesser memory requirements. At the same time, with the reduction in down-sampling, the overall compute overhead increases. On an Amazon AWS P3 instance (p3.2xlarge), which uses a single Nvidia V100 T GPU with 16 Gb GPU memory, an epoch with a batch size of 4, takes roughly 16 h.
In the Mask R-CNN implementation, negative anchors out of the ROIs, are assigned when IoU is less than 0.3 and positive when IoU is greater than or equal to 0.7, compared to the ground-truth boxes. To generate targets for Stage 2 classifier and mask heads, without using the RPN head, the default implementation uses 0.5 threshold to distinguish between negative ROIs and positive ROIs. Assigning negative values to ROIs with IoU of 0.1–0.5, ensures that there is some minimum intersection with the ground truth. ROIs that have less than 0.1 IoU are discarded. Doing so introduces hard example mining, as even to form negative samples, there is some minimal intersection criteria. We padded the images by 10 pixel on each side to create the final input image of 320 × 320 pixels. Padding ensures that at the border there are more valid anchors available to select the best matching ROIs, including the elongated instances that have a small width or height.
One potential avenue to improve small object detection is to use dilation ( Yu and Koltun, 2015 ), which enlarges the receptive field without losing resolution, and so can provide a context for detecting small buildings. We set the dilation rate to 2 on all combination of stages in the Resnet backbone. This is a dataset for a single object detection, so discriminative information from context could not perhaps be as useful but providing the context, in general, is a valuable means in detecting small objects.
To train on the Mapping Challenge Dataset, we used the pre-trained model (trained on MS COCO dataset) to initialize the Mask R-CNN network. Then we used the Mapping Challenge Dataset to train the model for eight epochs, as shown in Table 4 . Epoch 1 to 4 minimized the loss using Stochastic Gradient Descent, at a learning rate of 0.001 and Epoch 5 to 7 used learning rate of 0.0001. On epoch 8, we used half the samples to train at a learning rate of 0.0001 but the other half used learning rate of 1 e − 5 . We also applied an augmentation of horizontal flip and a vertical flip during training.
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-t004.jpg)
TABLE 4 . Overview and comparison of results for each model.
Epoch 2 to 4 were trained on a smaller subset of the training set, obtained by filtering the dataset so that at least one object of area less than 256 pixel 2 appeared in any given image. Medium and large objects already had high precision and recall at epoch 1, as shown in Figure 15 .
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g015.jpg)
FIGURE 15 . Comparison of precision and recall for different object sizes when using the Mask R-CNN approach detailed in Section 9.
Nevertheless, We can see in Figure 15 that from epoch 1 to epoch 8 small object precision improved from 0.706 to 0.82 while medium object improved marginally from 0.9857 to 0.9874. An explanation can be attributed to the fact that the smallest anchor scale used for the MS COCO dataset was 32, while in ours, the smaller anchor scale of 8. While the hierarchical representation learned by the ResNet backbone on the COCO dataset, especially in the bottom layers were useful, the Mask R-CNN model had not seen many small object samples, so the whole pipeline required more training to be as effective for the smaller objects. Weights trained from samples with mask loss weight set to 10 was also kept as part of the final model. A straightforward approach that might result in a better model would be to train for four epochs with a learning rate of 0.001, another 4 with a learning rate of 0.0001 and 1 epoch with learning rate of 1 e − 5 , with no sub-setting and with no changes to mask loss weights. At validation and test time, predictions from the annotated images, flipped vertically and horizontally is merged with the regular prediction. Then non-max-suppression is applied to obtain the final prediction annotation.
The multi-task loss in Mask R-CNN includes loss from the region proposal stage and the detection stage. The regional proposal loss consists of, class loss , for positive and negative ROIs and associated bounding box loss . The detection stage consists of, object class loss , bounding box loss and mask loss . Weighting a particular loss can affect its contribution to the overall loss. In this implementation, we increased the weight of the mask loss by 10 folds, near the end of training, for better semantic segmentation.
Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.937, and a A R I o U ≥ 0.5 of 0.959.
10. Conclusion
In this work, we explore different flavors of U-Net and Mask R-CNN on a task of instance segmentation on high resolution satellite imagery to detect buildings. The dataset used, was a derivative of the SpaceNet ( Spacenet on aws, 2018 ) dataset, and was post processed to enhance ease of accessibility for a broader set of Deep Learning researchers who may or may not be familiar with the handling and manipulation of raw satellite imagery. The evaluation metric used for all the experiments was designed to incentivize loose segmentation (an I o U ≥ 0.5 was considered a correct detection) of buildings of various shapes and sizes.
In the previous sections, four U-Net implementations were presented, each coming with its own specificities, and one Mask R-CNN approach, which was finally found to be the best performing model. A comparison of some example results for all these different architectures is shown in Figure 16 and detailed results on a test data set are given in Table 5 .
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-g016.jpg)
FIGURE 16 . Inference results of each model on a few sample images: Model 1 - Instance segmentation using Deeper U-Net (Section 7); Model 2 - Instance Segmentation using Mask R-CNN (Section 9); Model 3 - Adapted U-Net architecture of Varying Depths (Section 8); Model 4 - Fast Learnings for Fast Mapping Using U-Net (Section 6); Model 5 - Instance Segmentation Using Customized U-Net (Section 5).
![satellite image processing thesis titles www.frontiersin.org](https://www.frontiersin.org/files/Articles/534696/frai-03-534696-HTML-r3/image_m/frai-03-534696-t005.jpg)
TABLE 5 . Explanation of computational steps of the U-Net pipeline (see Figure 2 ).
The task presented two major difficulties arising from the evaluation metric choice: the designed model had to detect small instances as effectively as larger ones, and had to distinguish thin separation areas between closely located buildings. The first U-Net approach, presented in Section 5, relied on a custom weighted loss function to alleviate these difficulties, penalizing more misclassification of pixels located on small instances or in the separation areas of two close buildings. At testing time, several images were generated by rotating and flipping the original ones, and a gradient-boosting algorithm—Light-GBM—was used to construct the final prediction mask, based on the aggregated U-Net outputs. Although this approach allowed to reach high performance scores ( A P I o U ≥ 0.5 of 0.938, A R I o U ≥ 0.5 of 0.946), another U-Net approach, presented in Section 7, achieves similar scores without performing any data augmentation, introducing custom weight coefficients in the loss, nor using boosting algorithm on top of the results. The new U-Net proposed was however modified to become one layer deeper compared to the original U-Net implementation, which increased the training time needed. One approach that originated from an auto-encoder-like structure systematically analyzed the influence of the depth of a U-Net like architecture in Section 8. It further investigated additional improvements on top of the auto-encoder network as the use of some image processing tools, like the morphological opening filter and CRF. Deeper U-Nets showed to further improve the performance, but not only is training time increasing, but the impact grew smaller. This might be further enhanced through introducing data augmentation as done in the first approach and which might become more important with a growing number of layers and parameters. In Section 6, another U-Net is presented, applying other adaptations during training: it was trained following customized learning rate and momentum schedules, allowing to reduce the time needed to train the model until convergence. Both these two U-Net approaches allowed to reach high scores similar to the ones that were attained with the two approaches presented first. Finally, the best performing model is the one presented in Section 9, which proposes an implementation of Mask R-CNN to solve the segmentation task. To improve detection of small instances, the anchor sizes were lowered compared to the original implementation. This model, once trained, reached a A P I o U ≥ 0.5 of 0.937, and a A R I o U ≥ 0.5 of 0.959.
All the approaches presented in this paper were found to be efficient ways of solving the building segmentation task proposed on satellite images. Interestingly, each approach came with its own adaptations, and the scores attained by the different proposed models reached similar A P I o U ≥ 0.5 and A R I o U ≥ 0.5 scores.
Data Availability Statement
All the datasets used in the experiments in the paper are available at: https://www.aicrowd.com/challenges/mapping-challenge . The test dataset is not released, as the goal of this work is to set up an ongoing benchmark. However, participants can evaluate their models by submitting solutions to the page here: https://www.aicrowd.com/challenges/mapping-challenge .
Author Contributions
The contributors to the paper are SM, JC, KK, AP, PT, SK, JR, DL, MP, SF, JG, MSc, AT, GM, NR, MSa. Out of the above contributors SM came up with the idea, baseline, and the required dataset aggregation and hosting of the challenge. JC, KK, AP, PT, SK, JR, DL, MP, SF, JG, MSc, AT—are participants in the challenge, GM and NR aggregated the results and analysis for the research paper and MSa helped in conceptualising the problem statement. All authors contributed to the article and approved the submitted version.
The research/work of Section 8 was supported by the Cluster of Excellence Cognitive Interaction Technology “CITEC” (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).
Conflict of Interest
Authors JC, KK were employed by the company neptune.ml; AP, PT were employed by deepsense.ai; SK was employed by NARMA; DL was employed by Anthropocene Labs.The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Acknowledgments
The authors would like to thank Lars Bromley (UNOSAT), Miguel Luengo-Oroz (UN Global Pulse) and Paul Vermutllen (Humanity and Inclusion) for their continued support and help in making this work possible. Additionally, the authors would like to thank the authors of the SpaceNet ( Spacenet on aws, 2018 ) dataset for allowing us to use it as the source dataset in the competition.
a The individual training, validation and test sets are available at: https://www.aicrowd.com/challenges/mapping-challenge/dataset_files
b open source framework, available here: https://pytorch.org/
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). “Tensorflow: a system for large-scale machine learning,” in OSDI (London, UK: Pearson Education ), Vol. 16, 265–283.
Google Scholar
Abdulla, W. (2017). Mask r-cnn for object detection and instance segmentation on keras and tensorflow. Available at: https://github.com/matterport/Mask_RCNN (Accessed October 9, 2020).
Atif, N., Bhuyan, M., and Ahamed, S. (2019). “A review on semantic segmentation from a modern perspective,” in 2019 international conference on electrical, electronics and computer engineering (UPCON) , 2019 November 8 – 10 , Aligarh, India . Piscataway, NJ: IEEE , 1–6.
Badrinarayanan, V., Kendall, A., and Cipolla, R. (2015). “Segnet: a deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence , 1 December 2017 , Vol. 39, No. 12, 2481–2495, CoRR, abs/1511.00561.
Bradski, G. (2000). The OpenCV library. Dr Dobb’s J. Software Tools. 25, 120–125.
Ben Hamida, A., Benoit, A., Lambert, P., Klein, L., Ben Amar, C., and Audebert, N. (2017). “Deep learning for semantic segmentation of remote sensing images with rich spectral content,” in 2017 IEEE international geoscience and remote sensing symposium (IGARSS) , Piscataway, NJ: IEEE . 2569–2572.
Castillo-Navarro, J., Audebert, N., Boulch, A., Le Saux, B., and Lefèvre, S. (2019). “What data are needed for semantic segmentation in earth observation?,” in 2019 joint urban remote sensing event (JURSE) , May 2019 , Vannes, France , 1–4.
Chen, G., Zhang, X., Wang, Q., Dai, F., Gong, Y., and Zhu, K. (2018). Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing. , 11 (5), 1633–1644. doi:10.1109/jstars.2018.2810320
CrossRef Full Text | Google Scholar
Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018a). DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. , 40 (4), 834–848.
Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018b). “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV) , 2018 Sep 8–14 , Munich, Germany , 801–818.
Chollet, F. (2015). Keras. Available at: https://github.com/fchollet/keras and https://blog.keras.io/building-autoencoders-in-keras.html (Accessed October 9, 2020).
Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., and Pal, C. (2016). “The importance of skip connections in biomedical image segmentation,” in Deep learning and data labeling for medical applications. dlmia 2016, labels 2016. Lecture notes in computer science . Editor G. Carneiro (Cham, Switzerland: Springer ), Vol. 10008. CoRR, abs/1608.04117.
Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics. , 36 (4), 193–202. doi:10.1007/bf00344251
Giannakopoulos, P. (2017). Kaggle-carvana-image-masking-challenge. Available at: https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge (Accessed October 9, 2020).
Gugger, S. (2017a). How do you find a good learning rate. Available at: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html . (Accessed October 9, 2020)
Gugger, S. (2017b). The 1cycle policy. Available at: https://sgugger.github.io/the-1cycle-policy.html .
Hao, S., Zhou, Y., and Guo, Y. (2020). A brief survey on semantic segmentation with deep learning. Neurocomputing , 406, 302–321. doi:10.1016/j.neucom.2019.11.118
He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , San Juan, Puerto Rico: IEEE , 770–778.
He, K., Gkioxari, G., Dollár, P., and Girshick, R. B. (2017). “Mask R-CNN,” in 2017 IEEE International conference on computer vision (ICCV) , Venice: IEEE , 2980–2988, CoRR, abs/1703.06870.
He, X., Zemel, R. S., and Carreira-Perpiñán, M. Á. (2004). “Multiscale conditional random fields for image labeling,” in Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 . New York, NY: IEEE , CVPR 2004, Vol. 2, II.
Howard, J. (2018). Training Imagenet in 3 hours for $ 25 ; and CIFAR10 for $0.26. Available at: http://www.fast.ai/2018/04/30/dawnbench-fastai/ .
Howard, J, and Ruder, S. (2018). “Universal language model fine-tuning for text classification,” in Proceedings of the 56th annual meeting of the association for computational linguistics (Long Papers), Melbourne, Australia, July 15 - 20, 2018.
Iglovikov, V. I., Seferbekov, S., Buslaev, A. V., and Shvets, A. (2018). TernausNetV2: fully convolutional network for instance segmentation. ArXiv e-prints.
Ioffe, S., and Szegedy, C. (2015). “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd international conference on international conference on machine learning, PMLR 37 , Lille, France , Vol. 37, 448–456.
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., and Bengio, Y. (2017). “The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation,” in Proceedings of the conference on computer vision and pattern recognition workshops (CVPRW), Honolulu, HI , 11–19.
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). “Lightgbm: a highly efficient gradient boosting decision tree,” in 31st conference on neural information processing systems (NIPS 2017) , Long Beach, CA . December 2017, 3149–3157.
Kingma, D. P., and Ba, J. (2014). Adam: a method for stochastic optimization. CoRR, abs/1412.6980.
Krähenbühl, P., and Koltun, V. (2012). Efficient inference in fully connected crfs with gaussian edge potentials . Stanford, CA: Stanford University CoRR, abs/1210.5644.
Krizhevsky, A., Sutskever, I., and Geoffrey, E. (2012). “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems , Cambridge, MA: The MIT Press , 1097–1105.
Lateef, F., and Ruichek, Y. (2019). Survey on semantic segmentation using deep learning techniques. Neurocomputing , 338, 321–348. doi:10.1016/j.neucom.2019.02.003
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521 (7553), 436, doi:10.1038/nature14539
Leslie, S, and Topin, N. (2017). “Super-convergence: very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications , Ithaca, NY: Cornell University .
Li, R., Liu, W., Yang, L., Sun, S., Hu, W., Zhang, F., et al. (2017). “DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation,” in IEEE journal of selected topics in applied earth observations and remote sensing , November 2018 , IEEE , Vol. 11, No. 11, 3954–3962.
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017). “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition . (CVPR), Honolulu, HI , 936–944.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). “SSD: single shot multibox detector,” in: Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science. Editors B. Leibe, J. Matas, N. Sebe, M. Welling (Springer, Cham), Vol. 9905.
Long, J., Shelhamer, E., and Darrell, T. (2015). “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015 , 3431–3440.
Milletari, F., Navab, N., and Ahmadi, S.-A. (2016). “V-net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, 2016, 565--571 .
Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., and Terzopoulos, D. (2020). Image segmentation using deep learning: a survey. arXiv preprint arXiv:2001.05566.
Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-net: convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention (MICCAI), volume 9351 of LNCS . New York, NY: Springer ,, 234–241. Available on arXiv:1505.04597 [cs.CV].
Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst . 28, 91–99.
Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2009). TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81 (1), 2–23. doi:10.1007/s11263-007-0109-1
Smith, L. N. (2017). “Cyclical learning rates for training neural networks,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) , Santa Rosa, CA , 464–472.
Spacenet on aws (2018). Available at: https://spacenetchallenge.github.io/datasets/datasetHomePage.html (Accessed July 16, 2018).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. , 15(1), 1929–1958.
Stanford DAWNBench (2018). Stanford dawnbench competition. Available at: https://dawn.cs.stanford.edu/benchmark .
Thoma, M., (2016). A survey of semantic segmentation. CoRR, abs/1602.06541.
Wu, H., Zhang, J., Huang, K., Liang, K., and Yu, Y. (2019). FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816. doi:10.1109/iccv.2019.00687
Xie, S., Girshick, R. B., Dollár, P., Tu, Z., and He, K. (2016). “Aggregated residual transformations for deep neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Honolulu, HI . CoRR, abs/1611.05431.
Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). “Conditional random fields as recurrent neural networks,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, 1529–1537 CoRR, abs/1502.03240.
Keywords: deep learning, machine learning, remote sensing, satellite imagery, semantic segmentation
Citation: Mohanty SP, Czakon J, Kaczmarek KA, Pyskir A, Tarasiewicz P, Kunwar S, Rohrbach J, Luo D, Prasad M, Fleer S, Göpfert JP, Tandon A, Mollard G, Rayaprolu N, Salathe M and Schilling M (2020) Deep Learning for Understanding Satellite Imagery: An Experimental Survey. Front. Artif. Intell. 3:534696. doi:10.3389/frai.2020.534696
Received: 13 February 2020; Accepted: 22 September 2020; Published: 16 November 2020.
Reviewed by:
Copyright © 2020 Mohanty, Czakon, Kaczmarek, Pyskir, Tarasiewicz, Kunwar, Rohrbach, Luo, Prasad, Fleer, Göpfert, Tandon, Mollard, Rayaprolu, Salathé, and Schilling. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Nikhil Rayaprolu, [email protected] , Sharada Prasanna Mohanty, [email protected] , Marcel Salathe, [email protected]
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
Information
- Author Services
Initiatives
You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.
All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .
Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.
Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.
Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.
Original Submission Date Received: .
- Active Journals
- Find a Journal
- Proceedings Series
- For Authors
- For Reviewers
- For Editors
- For Librarians
- For Publishers
- For Societies
- For Conference Organizers
- Open Access Policy
- Institutional Open Access Program
- Special Issues Guidelines
- Editorial Process
- Research and Publication Ethics
- Article Processing Charges
- Testimonials
- Preprints.org
- SciProfiles
- Encyclopedia
![Applied Sciences applsci-logo](https://pub.mdpi-res.com/img/journals/applsci-logo.png?8600e93ff98dbf14)
Article Menu
- Subscribe SciFeed
- Recommended Articles
- Google Scholar
- on Google Scholar
- Table of Contents
Find support for a specific problem in the support section of our website.
Please let us know what you think of our products and services.
Visit our dedicated information section to learn more about MDPI.
JSmol Viewer
Image processing techniques for analysis of satellite images for historical maps classification—an overview.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.mdpi.com/bundles/mdpisciprofileslink/img/unknown-user.png)
1. Introduction
2. image processing in remote sensing, 2.1. image enhancement, 2.2. feature extraction, 2.3. image segmentation, 2.4. image fusion, 2.5. change detection, 2.6. image compression, 2.7. image classification, 2.8. image feature detection, 3. performance analysis, 4. discussion, 5. conclusions, author contributions, conflicts of interest.
- Ping, X.; Bingqiang, C.; Lingyun, X.; Jingcheng, Z.; Lei, Z.; Hangbo, D. A new MNF–BM4D denoising algorithm based on guided filtering for hyperspectral images. ISA Trans. 2019 , 92 , 315–324. [ Google Scholar ]
- Chang, Y.C. A flexible contrast enhancement method with visual effects and brightness preservation: Histogram planting. Comput. Electr. Eng. 2018 , 69 , 796–807. [ Google Scholar ] [ CrossRef ]
- Suresh, S.; Lal, S. Modified differential evolution algorithm for contrast and brightness enhancement of satellite images. Appl. Soft Comput. J. 2017 , 61 , 622–641. [ Google Scholar ] [ CrossRef ]
- Singh, H.; Kumar, A.; Balyan, L.K.; Singh, G.K. A novel optimally weighted framework of piecewise gamma corrected fractional order masking for satellite image enhancement. Comput. Electr. Eng. 2019 , 75 , 245–261. [ Google Scholar ] [ CrossRef ]
- Tang, S.; Wu, B.; Zhu, Q. Combined adjustment of multi-resolution satellite imagery for improved geo-positioning accuracy. ISPRS J. Photogramm. Remote Sens. 2016 , 114 , 125–136. [ Google Scholar ] [ CrossRef ]
- Vijayaraj, V.; Bright, E.A.; Bhaduri, B.L. Rapid damage assessment from high resolution imagery. Int. Geosci. Remote Sens. Symp. 2008 , 3 , 1445–1448. [ Google Scholar ]
- Yuan, X.; Chen, S.; Yuan, W.; Cai, Y. Poor textural image tie point matching via graph theory. ISPRS J. Photogramm. Remote Sens. 2017 , 129 , 21–31. [ Google Scholar ] [ CrossRef ]
- Sedaghat, A.; Mohammadi, N. Uniform competency-based local feature extraction for remote sensing images. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 142–157. [ Google Scholar ] [ CrossRef ]
- Rathore, M.M.U.; Ahmad, A.; Paul, A.; Wu, J. Real-time continuous feature extraction in large size satellite images. J. Syst. Archit. 2016 , 64 , 122–132. [ Google Scholar ] [ CrossRef ]
- Zhang, L.; Sun, Q. Saliency detection and region of interest extraction based on multi-image common saliency analysis in satellite images. Neurocomputing 2018 , 283 , 150–165. [ Google Scholar ] [ CrossRef ]
- Huang, X.; Chen, H.; Gong, J. Angular difference feature extraction for urban scene classification using ZY-3 multi-angle high-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 127–141. [ Google Scholar ] [ CrossRef ]
- Schmidt, A.; Lafarge, F.; Brenner, C.; Rottensteiner, F.; Heipke, C. Forest point processes for the automatic extraction of networks in raster data. ISPRS J. Photogramm. Remote Sens. 2017 , 126 , 38–55. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Ciecholewski, M. River channel segmentation in polarimetric SAR images: Watershed transform combined with average contrast maximisation. Expert Syst. Appl. 2017 , 82 , 196–215. [ Google Scholar ] [ CrossRef ]
- Suresh, S.; Lal, S. An efficient cuckoo search algorithm based multilevel thresholding for segmentation of satellite images using different objective functions. Expert Syst. Appl. 2016 , 58 , 184–209. [ Google Scholar ] [ CrossRef ]
- Grinias, I.; Panagiotakis, C.; Tziritas, G. MRF-based segmentation and unsupervised classification for building and road detection in peri-urban areas of high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2016 , 122 , 145–166. [ Google Scholar ] [ CrossRef ]
- Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 158–172. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Pare, S.; Bhandari, A.K.; Kumar, A.; Singh, G.K. A new technique for multilevel color image thresholding based on modified fuzzy entropy and Lévy flight firefly algorithm. Comput. Electr. Eng. 2018 , 70 , 476–495. [ Google Scholar ] [ CrossRef ]
- Alshehhi, R.; Marpu, P.R. Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2017 , 126 , 245–260. [ Google Scholar ] [ CrossRef ]
- Anjos, A.; dos Shahbazkia, H.R. Bi-Level Image Thresholding. In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, Funchal, Portugal, 28–31 January 2008; pp. 70–76. [ Google Scholar ]
- Zheng, X.; Ye, H.; Tang, Y. Image bi-level thresholding based on gray level-local variance histogram. Entropy 2017 , 19 , 191. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017 , 39 , 2481–2495. [ Google Scholar ] [ CrossRef ]
- Bhandari, A.K.; Kumar, A.; Singh, G.K. Tsallis entropy based multilevel thresholding for colored satellite image segmentation using evolutionary algorithms. Expert Syst. Appl. 2015 , 42 , 8707–8730. [ Google Scholar ] [ CrossRef ]
- Su, T.; Zhang, S. Local and global evaluation for remote sensing image segmentation. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 256–276. [ Google Scholar ] [ CrossRef ]
- Tian, J.; Dezert, J. Fusion of multispectral imagery and DSMs for building change detection using belief functions and reliabilities. Int. J. Image Data Fusion 2018 , 9832 , 1–27. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Ferraris, V.; Dobigeon, N.; Wei, Q.; Chabert, M. Detecting Changes Between Optical Images of Different Spatial and Spectral Resolutions: A Fusion-Based Approach. IEEE Trans. Geosci. Remote Sens. 2016 , 56 , 1566–1578. [ Google Scholar ] [ CrossRef ]
- Gangkofner, U.G.; Pradhan, P.S.; Holcomb, D.W. Optimizing the high-pass filter addition technique for image fusion. Photogramm. Eng. Remote Sens. 2008 , 74 , 1107–1118. [ Google Scholar ] [ CrossRef ]
- Al-wassai, F.A.; Kalyankar, N.V.; Al-zuky, A.A. Arithmetic and Frequency Filtering Methods of Pixel-Based Image Fusion Techniques. Int. J. Comput. Sci. Issues 2011 , 8 , 113–122. [ Google Scholar ]
- de Béthune, S.; Muller, F.; Binard, M. Adaptive intensity matching filters: A new tool for multiresolution data fusion. In Proceedings of the Multi-Sensor Systems and Data Fusion for Telecommunications, Remote Sensing and Radar, Lisbon, Portugal, 29 September–22 October 1997; pp. 28.1–28.15. [ Google Scholar ]
- Cornet, Y.; de Bethune, S.; Binard, M.; Muller, F.; Legros, G.; Nadasdi, I. RS data fusion by local mean and variance matching algorithms: Their respective efficiency in a complex urban context. In Proceedings of the IEEEASPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, Rome, Italy, 8–9 November 2001; pp. 105–109. [ Google Scholar ]
- Gaikwad, A.J.; Vishwakarma, S.; Bhalerao, S. A Review on Various Image Fusion Techniques. Int. J. Adv. Res. Comput. Commun. Eng. ISO 2016 , 5 , 144–149. [ Google Scholar ]
- Chang, N.; Member, S.; Bai, K.; Imen, S.; Chen, C.; Gao, W. Multisensor Satellite Image Fusion and Networking for All-Weather Environmental Monitoring. IEEE Syst. J. 2018 , 12 , 1341–1357. [ Google Scholar ] [ CrossRef ]
- Mascarenhas, N.D.A.; Banon, G.J.F.; Candeias, A.L.B. Multispectral image data fusion under a Bayesian approach. Int. J. Remote Sens. 1996 , 17 , 1457–1471. [ Google Scholar ] [ CrossRef ]
- Fasbender, D.; Brasseur, O.; Bogaert, P. Bayesian data fusion for space–Time prediction of air pollutants: The case of NO 2 in Belgium. Atmos. Environ. 2009 , 43 , 4632–4645. [ Google Scholar ] [ CrossRef ]
- Chang, N.; Vannah, B.W.; Yang, Y.J. Integrated data fusion and mining techniques for monitoring total organic carbon concentrations in a lake. Int. J. Remote Sens. 2014 , 35 , 1064–1093. [ Google Scholar ] [ CrossRef ]
- Luo, X.; Zhang, Z.; Wu, X. A novel algorithm of remote sensing image fusion based on shift-invariant Shearlet transform and regional selection. Int. J. Electron. Commun. 2016 , 70 , 186–197. [ Google Scholar ] [ CrossRef ]
- Maurya, L.; Mahapatra, P.K.; Kumar, A. A social spider optimized image fusion approach for contrast enhancement and brightness preservation. Appl. Soft Comput. 2017 , 52 , 575–592. [ Google Scholar ] [ CrossRef ]
- Zhu, Z.; Yin, H.; Chai, Y.; Li, Y.; Qi, G. A novel multi-modality image fusion method based on image decomposition and sparse representation. Inf. Sci. 2018 , 432 , 516–529. [ Google Scholar ] [ CrossRef ]
- Anandhi, D.; Valli, S. An algorithm for multi-sensor image fusion using maximum a posteriori and nonsubsampled contourlet transform. Comput. Electr. Eng. 2018 , 65 , 139–152. [ Google Scholar ] [ CrossRef ]
- Li, Y.; He, Z.; Zhu, H.; Zhang, W.; Wu, Y. Jointly registering and fusing images from multiple sensors. Inf. Fusion 2016 , 27 , 85–94. [ Google Scholar ] [ CrossRef ]
- Kim, M.; Han, D.K.; Ko, H. Joint patch clustering-based dictionary learning for multimodal image fusion. Inf. Fusion 2016 , 27 , 198–214. [ Google Scholar ] [ CrossRef ]
- Wu, M.; Huang, W.; Niu, Z.; Wang, C.; Li, W.; Yu, B. Validation of synthetic daily Landsat NDVI time series data generated by the improved spatial and temporal data fusion approach. Inf. Fusion 2018 , 40 , 34–44. [ Google Scholar ] [ CrossRef ]
- Zou, J.; Li, W.; Chen, C.; Du, Q. Scene classification using local and global features with collaborative representation fusion. Inf. Sci. 2016 , 348 , 209–226. [ Google Scholar ] [ CrossRef ]
- Li, X.; Wang, L.; Wang, J.; Zhang, X. Multi-focus image fusion algorithm based on multilevel morphological component analysis and support vector machine. IET Image Process 2017 , 11 , 919–926. [ Google Scholar ] [ CrossRef ]
- Roy, D.P.; Ju, J.; Lewis, P.; Schaaf, C.; Gao, F.; Hansen, M.; Lindquist, E. Multi-temporal MODIS-Landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data. Remote Sens. Environ. 2008 , 112 , 3112–3130. [ Google Scholar ] [ CrossRef ]
- Gao, F.; Hilker, T.; Zhu, X.; Anderson, M.; Masek, J.; Wang, P.; Yang, Y. Fusing Landsat and MODIS Data for Vegetation Monitoring. IEEE Geosci. Remote Sens. Mag. 2015 , 3 , 47–60. [ Google Scholar ] [ CrossRef ]
- Gong, M.; Zhan, T.; Zhang, P.; Miao, Q. Superpixel-Based Difference Representation Learning for Change Detection in Multispectral. IEEE Trans. Geosci. Remote Sens. 2017 , 55 , 2658–2673. [ Google Scholar ] [ CrossRef ]
- Asokan, A.; Anitha, J. Change detection techniques for remote sensing applications: A survey. Earth Sci. Inform. 2019 , 12 , 143–160. [ Google Scholar ] [ CrossRef ]
- Gandhi, M.S.P.; Thummalu, N.A.C. Ndvi: Vegetation change detection using remote sensing and gis—A case study of Vellore District. In Proceedings of the 3rd International Conference on Recent Trends in Computing (ICRTC), Delhi, India, 12–13 March 2015; pp. 1199–1210. [ Google Scholar ]
- Rawat, J.S.; Kumar, M. Monitoring land use/cover change using remote sensing and GIS techniques: A case study of Hawalbagh block, district Almora, Uttarakhand, India. Egypt. J. Remote Sens. Space Sci. 2015 , 18 , 77–84. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Xu, D.; Chen, R.; Xing, X.; Lin, W. Detection of decreasing vegetation cover based on empirical orthogonal function and temporal unmixing analysis. Math. Probl. Eng. 2017 , 2017 . [ Google Scholar ] [ CrossRef ]
- Jabari, S.; Zhang, Y. RPC-Based Coregistration of VHR Imagery for Urban Change Detection. Photogramm. Eng. Remote Sens. 2016 , 82 , 521–534. [ Google Scholar ] [ CrossRef ]
- Haque, I.; Basak, R. Land cover change detection using GIS and remote sensing techniques: A spatio-temporal study on Tanguar Haor, Sunamganj, Bangladesh. Egypt. J. Remote Sens. Space Sci. 2017 , 20 , 251–263. [ Google Scholar ] [ CrossRef ]
- Zhu, Z. ISPRS Journal of Photogrammetry and Remote Sensing Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 370–384. [ Google Scholar ] [ CrossRef ]
- Verbesselt, J.; Hyndman, R.; Newnham, G.; Culvenor, D. Detecting trend and seasonal changes in satellite image time series. Remote Sens. Environ. 2010 , 114 , 106–115. [ Google Scholar ] [ CrossRef ]
- Qin, D.; Zhou, X.; Zhou, W.; Huang, G.; Ren, Y.; Horan, B.; He, J.; Kito, N. MSIM: A change detection framework for damage assessment in natural disasters. Expert Syst. Appl. 2018 , 97 , 372–383. [ Google Scholar ] [ CrossRef ]
- Guttler, F.; Ienco, D.; Nin, J.; Teisseire, M.; Poncelet, P. A graph-based approach to detect spatiotemporal dynamics in satellite image time series. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 92–107. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Genitha, C.H.; Rajesh, R.K. A technique for multi-spectral satellite image compression using EZW algorithm. In Proceedings of the 2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kumaracoil, India, 16–17 December 2016; Volume 1, pp. 689–694. [ Google Scholar ]
- Sahnoun, K.; Benabadji, N. Satellite image compression algorithm based on the FFT. Int. J. Multimed. Its Appl. 2014 , 6 , 301–310. [ Google Scholar ] [ CrossRef ]
- Liao, L.; Xiao, J.; Li, Y.; Wang, M.; Hu, R. Learned representation of satellite image series for data compression. Remote Sens. 2020 , 12 , 497. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Shihab, H.S.; Shafie, S.; Ramli, A.R.; Ahmad, F. Enhancement of Satellite Image Compression Using a Hybrid (DWT–DCT) Algorithm. Sens. Imaging 2017 , 18 , 30. [ Google Scholar ] [ CrossRef ]
- Hagag, A.; Hassan, E.S.; Amin, M.; Abd El-Samie, F.E.; Fan, X. Satellite multispectral image compression based on removing sub-bands. Optik 2017 , 131 , 1023–1035. [ Google Scholar ] [ CrossRef ]
- Shi, C.; Zhang, J.; Zhang, Y. Content-Based Onboard Compression for Remote Sensing Images. Neurocomputing 2016 , 191 , 330–340. [ Google Scholar ] [ CrossRef ]
- Marchetti, Y.; Nguyen, H.; Braverman, A.; Cressie, N. Spatial data compression via adaptive dispersion clustering. Comput. Stat. Data Anal. 2018 , 117 , 138–153. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Kong, W.; Wu, J.; Hu, Z.; Anisetti, M.; Damiani, E.; Jeon, G. Lossless compression for aurora spectral images using fast online bi-dimensional decorrelation method. Inf. Sci. 2017 , 381 , 33–45. [ Google Scholar ] [ CrossRef ]
- Xiao, B.; Lu, G.; Zhang, Y.; Li, W.; Wang, G. Lossless image compression based on integer Discrete Tchebichef Transform. Neurocomputing 2016 , 214 , 587–593. [ Google Scholar ] [ CrossRef ]
- Medjahed, S.A.; Ait Saadi, T.; Benyettou, A.; Ouali, M. A new post-classification and band selection frameworks for hyperspectral image classification. Egypt. J. Remote Sens. Space Sci. 2016 , 19 , 163–173. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Ming, D.; Zhou, T.; Wang, M.; Tan, T. Land cover classification using random forest with genetic algorithm-based parameter optimization. J. Appl. Remote Sens. 2016 , 10 , 035021. [ Google Scholar ] [ CrossRef ]
- Thakur, N.; Maheshwari, D. A review of image classification techniques. Int. Res. J. Eng. Technol. 2017 , 4 , 1588–1591. [ Google Scholar ]
- Tian, S.; Zhang, X.; Tian, J.; Sun, Q. Random forest classification of wetland landcovers from multi-sensor data in the arid region of Xinjiang, China. Remote Sens. 2016 , 8 , 954. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Villa, A.; Chanussot, J.; Benediktsson, J.A.; Jutten, C. Spectral unmixing for the classification of hyperspectral images at a finer spatial resolution. IEEE J. Sel. Top. Signal. Process. 2011 , 5 , 521–533. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Baghbaderani, R.K.; Wang, F.; Stutts, C.; Qu, Y.; Qi, H. Hybrid Spectral Unmixing in Land-Cover Classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3009–3012. [ Google Scholar ]
- Degerickx, J.; Roberts, D.A.; Somers, B. Enhancing the performance of Multiple Endmember Spectral Mixture Analysis (MESMA) for urban land cover mapping using airborne lidar data and band selection. Remote Sens. Environ. 2019 , 221 , 260–273. [ Google Scholar ] [ CrossRef ]
- Singh, A.; Singh, K.K. Satellite image classification using Genetic Algorithm trained radial basis function neural network, application to the detection of flooded areas. J. Vis. Commun. Image Represent. 2017 , 42 , 173–181. [ Google Scholar ] [ CrossRef ]
- Ma, L.; Li, M.; Ma, X.; Cheng, L.; Du, P.; Liu, Y. A review of supervised object-based land-cover image classification. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 277–293. [ Google Scholar ] [ CrossRef ]
- Liu, Q.; Hang, R.; Song, H.; Li, Z. Learning multiscale deep features for high-resolution satellite image scene classification. IEEE Trans. Geosci. Remote Sens. 2018 , 56 , 117–126. [ Google Scholar ] [ CrossRef ]
- Alonso-Montesinos, J.; Martínez-Durbán, M.; del Sagrado, J.; del Águila, I.M.; Batlles, F.J. The application of Bayesian network classifiers to cloud classification in satellite images. Renew. Energy 2016 , 97 , 155–161. [ Google Scholar ] [ CrossRef ]
- Feizizadeh, B. A novel approach of fuzzy Dempster-Shafer theory for spatial uncertainty analysis and accuracy assessment of object-based image classification. IEEE Geosci. Remote Sens. Lett. 2018 , 15 , 18–22. [ Google Scholar ] [ CrossRef ]
- Dhivya, R.; Prakash, R. Edge detection of satellite image using fuzzy logic. Cluster Comput. 2019 , 22 , 11891–11898. [ Google Scholar ] [ CrossRef ]
- Persello, C.; Stein, A. Deep Fully Convolutional Networks for the Detection of Informal Settlements in VHR Images. IEEE Geosci. Remote Sens. Lett. 2017 , 14 , 2325–2329. [ Google Scholar ] [ CrossRef ]
- Lin, C.H.; Lin, B.Y.; Lee, K.Y.; Chen, Y.C. Radiometric normalization and cloud detection of optical satellite images using invariant pixels. ISPRS J. Photogramm. Remote Sens. 2015 , 106 , 107–117. [ Google Scholar ] [ CrossRef ]
- Liasis, G.; Stavrou, S. Satellite images analysis for shadow detection and building height estimation. ISPRS J. Photogramm. Remote Sens. 2016 , 119 , 437–450. [ Google Scholar ] [ CrossRef ]
- Wu, T.; Hu, X.; Zhang, Y.; Zhang, L.; Tao, P.; Lu, L. Automatic cloud detection for high resolution satellite stereo images and its application in terrain extraction. ISPRS J. Photogramm. Remote Sens. 2016 , 121 , 143–156. [ Google Scholar ] [ CrossRef ]
- Wang, M.; Fan, C.; Pan, J.; Jin, S.; Chang, X. Image jitter detection and compensation using a high-frequency angular displacement method for Yaogan-26 remote sensing satellite. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 32–43. [ Google Scholar ] [ CrossRef ]
- Mera, D.; Bolon-Canedo, V.; Cotos, J.M.; Alonso-Betanzos, A. On the use of feature selection to improve the detection of sea oil spills in SAR images. Comput. Geosci. 2017 , 100 , 166–178. [ Google Scholar ] [ CrossRef ]
- Cao, L.; Wang, C.; Li, J. Vehicle detection from highway satellite images via transfer learning. Inf. Sci. 2016 , 366 , 177–187. [ Google Scholar ] [ CrossRef ]
- Santra, A.K.; Christy, C.J. Genetic Algorithm and Confusion Matrix for Document Clustering. Int. J. Comput. Sci. Issues 2012 , 9 , 322–328. [ Google Scholar ]
- Novaković, J.D.; Veljović, A.; Ilić, S.S.; Papić, Z.; Milica, T. Evaluation of Classification Models in Machine Learning. Theory Appl. Math. Comput. Sci. 2017 , 7 , 39–46. [ Google Scholar ]
- Foody, G. Local characterization of thematic classification accuracy through spatially constrained confusion matrices. Int. J. Remote 2005 , 26 , 1217–1228. [ Google Scholar ] [ CrossRef ]
- Andualem, T.G.; Belay, G.; Guadie, A. Land Use Change Detection Using Remote Sensing Technology. J. Earth Sci. Clim. Chang. 2018 , 9 , 1–6. [ Google Scholar ] [ CrossRef ]
- Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014 , 148 , 42–57. [ Google Scholar ] [ CrossRef ]
- Ariza-López, F.J.; Rodríguez-Avi, J.; Alba-Fernández, M.V.; García-Balboa, J.L. Thematic accuracy quality control by means of a set of multinomials. Appl. Sci. 2019 , 9 , 4240. [ Google Scholar ] [ CrossRef ] [ Green Version ]
- Strahler, A.H.; Boschetti, L.; Foody, G.M.; Friedl, M.A.; Hansen, M.C.; Herold, M.; Mayaux, P.; Morisette, J.T.; Stehman, S.V.; Woodcock, C.E. Global Land Cover Validation: Recommendations for Evaluation and Accuracy Assessment of Global Land Cover Maps ; Technical Report of Institute of Environmental Sustainability; Joint Reasearch Center: Ispra, Italy, 2006; pp. 1–60. [ Google Scholar ]
Click here to enlarge figure
Method | Test Images | Performance Metrics |
---|---|---|
1. Histogram Planting [ ] | Phobos Images of aircraft | Mean, Average Information |
2. Modified differential evolution [ ] | Images from NASA, Satpalda Geospatial Services and Satellite Imaging Corp. | Peak Signal to Noise Ratio (PSNR), Mean Square Error (MSE), Universal Quality Index (UQI), Normalized Absolute Error (NAE), Structural Contrast-Quality Index (SC-QI): |
3. Fractional Differential (FD) unsharp masking [ ] | Infrared Images | Average brightness (B), Average contrast (V), Average discrete information content (or entropy, E), Sharpness (S), Colorfulness (C), Correlation (R), Energy (E) and Homogeneity (M) |
Method | Test Images | Feature Extracted | Performance Metrics |
---|---|---|---|
1. Uniform Competency Feature Extraction [ ] | Synthetic Images: ETM+, World View 2, Real Images: SPOT4, SPOT6, Quick Bird, GeoEye1, ASTER | Rotation and scale invariant Local features | Recall, Precision, Root Mean Square Error |
2. RepTree, Machine Learning and Euclidean distance [ ] | Envisat images | Continues features such as roads, rivers | Efficiency and processing time |
3. Multi image saliency analysis [ ] | SPOT 5 images | ROI extraction such as clouds | Recall = 0.8, precision = 0.75, F-measure = 0.75 |
4. Digital Surface Models [ ] | ZY-3 satellite images | Pixel and feature level extraction of urban scenes | Overall accuracy = 96%, kappa coefficient = 0.862 |
5. Reversible jump Markov chain Monte Carlo sampler [ ] | Synthetic image | Extraction of rivers, channels and roads | Completeness = 98.8, correctness = 94% and quality = 92.9% |
Method | Test Images | Performance Metrics |
---|---|---|
1. Cuckoo Search, McCulloch’s method [ ] | Pleiades satellite images | PSNR = 21.9833, MSE = 411.8573, FSIM = 0.9595 |
2. Markov Random Filed method [ ] | QuickBird bands with four spectral bands | F-measure = 0.7427, Precision = 0.7170, Recall = 0.7704 |
3. Deep convolutional Neural Network [ ] | IR images | Accuracy = 89.9%, Confusion matrix |
4. Levy flight firefly algorithm [ ] | Multiband satellite images from NASA | FSIM = 0.66278, PSNR = 13.7037, MSE = 3376.1616, SSIM = 0.92448 |
5. Graph based segmentation, Gabor filter [ ] | Quickbird satellite images | Completeness = 92.5%, Correctness = 91%, Quality = 84.7% |
Method | Image Dataset | Performance Metrics | ||
---|---|---|---|---|
Mutual Information | Q | Visual Information Fidelity | ||
1. Image cartoon texture decomposition and sparse coding [ ] | NIR and visible IKONOS images | ------ | 0.357 | 0.3470 |
2. Contourlet Transform [ ] | SAR and panchromatic images | 9.342 | ----- | ------ |
3. Expectation Maximization algorithm [ ] | Landsat image set | ---- | 0.589 | ---- |
4. Dictionary learning method [ ] | Visible and IR image set | 5.778 | 0.660 | 0.578 |
Method | Test Images | Performance Metrics |
---|---|---|
1. DWT-DCT [ ] | Landsat 8 images | PSNR, Compression Ratio |
2. DWT, entropy encoder [ ] | ETM+ satellite multispectral images | PSNR, Spectral Angle Mapper(SAM) |
3. Content based compression [ ] | SAR image | PSNR, Mean Absolute Error(MAE) |
4. Adaptive dispersion clustering based compression [ ] | AMSR-2 dataset | Mean Prediction Error Variance Ratio(MPEVR), Root Mean Square Prediction Error(RMSPE) |
Method | Test Images | Application | Performance Metrics |
---|---|---|---|
1. Fuzzy logic based detection | Hyperspectral image [ ] | Edge detection | PSNR, MSE |
2. Deep Convolutional Neural Network [ ] | QuickBird VHR image | Informal settlement detection | Overall accuracy |
3. Invariant pixel detection, PCA [ ] | Landsat 7 ETM+ images | Cloud detection | Accuracy and F-measure |
4. Active contour model [ ] | Google Earth images | Shadow detection and height estimation | Precision and recall |
Actual Class | |||||||
---|---|---|---|---|---|---|---|
Water | Forest | Agriculture | Grass | Urban | Bareland | Total | |
Water | 60 | 0 | 0 | 0 | 0 | 0 | 60 |
Forest | 0 | 20 | 0 | 0 | 0 | 0 | 20 |
Agriculture | 0 | 0 | 20 | 0 | 0 | 0 | 20 |
Grass | 30 | 0 | 10 | 290 | 10 | 0 | 340 |
Urban | 0 | 0 | 0 | 0 | 90 | 0 | 90 |
Bareland | 10 | 20 | 60 | 50 | 50 | 750 | 940 |
Total | 100 | 40 | 90 | 340 | 150 | 750 | 1470 |
Share and Cite
Asokan, A.; Anitha, J.; Ciobanu, M.; Gabor, A.; Naaji, A.; Hemanth, D.J. Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview. Appl. Sci. 2020 , 10 , 4207. https://doi.org/10.3390/app10124207
Asokan A, Anitha J, Ciobanu M, Gabor A, Naaji A, Hemanth DJ. Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview. Applied Sciences . 2020; 10(12):4207. https://doi.org/10.3390/app10124207
Asokan, Anju, J. Anitha, Monica Ciobanu, Andrei Gabor, Antoanela Naaji, and D. Jude Hemanth. 2020. "Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview" Applied Sciences 10, no. 12: 4207. https://doi.org/10.3390/app10124207
Article Metrics
Article access statistics, further information, mdpi initiatives, follow mdpi.
![MDPI Open Access Journals MDPI](https://pub.mdpi-res.com/img/design/mdpi-pub-logo-white-small.png?71d18e5f805839ab?1721387811)
Subscribe to receive issue release notifications and newsletters from MDPI journals
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
Techniques for deep learning with satellite & aerial imagery
satellite-image-deep-learning/techniques
Folders and files.
Name | Name | |||
---|---|---|---|---|
1,330 Commits | ||||
Repository files navigation
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/logo.png)
👉 satellite-image-deep-learning.com 👈
Introduction.
Deep learning has revolutionized the analysis and interpretation of satellite and aerial imagery, addressing unique challenges such as vast image sizes and a wide array of object classes. This repository provides an exhaustive overview of deep learning techniques specifically tailored for satellite and aerial image processing. It covers a range of architectures, models, and algorithms suited for key tasks like classification, segmentation, and object detection.
How to use this repository: use Command + F (Mac) or CTRL + F (Windows) to search this page for e.g. 'SAM'
Classification
Segmentation, object detection, cloud detection & removal, change detection, time series, crop classification, crop yield & vegetation forecasting, generative networks, autoencoders, dimensionality reduction, image embeddings & similarity search, few & zero shot learning, self-supervised, unsupervised & contrastive learning, large vision & language models (llms & lvms), foundational models.
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/merced.png)
Classification is a fundamental task in remote sensing data analysis, where the goal is to assign a semantic label to each image, such as 'urban', 'forest', 'agricultural land', etc. The process of assigning labels to an image is known as image-level classification. However, in some cases, a single image might contain multiple different land cover types, such as a forest with a river running through it, or a city with both residential and commercial areas. In these cases, image-level classification becomes more complex and involves assigning multiple labels to a single image. This can be accomplished using a combination of feature extraction and machine learning algorithms to accurately identify the different land cover types. It is important to note that image-level classification should not be confused with pixel-level classification, also known as semantic segmentation. While image-level classification assigns a single label to an entire image, semantic segmentation assigns a label to each individual pixel in an image, resulting in a highly detailed and accurate representation of the land cover types in an image. Read A brief introduction to satellite image classification with neural networks
Land classification on Sentinel 2 data using a simple sklearn cluster algorithm or deep learning CNN
Multi-Label Classification of Satellite Photos of the Amazon Rainforest using keras or FastAI
EuroSat-Satellite-CNN-and-ResNet -> Classifying custom image datasets by creating Convolutional Neural Networks and Residual Networks from scratch with PyTorch
Detecting Informal Settlements from Satellite Imagery using fine-tuning of ResNet-50 classifier with repo
Land-Cover-Classification-using-Sentinel-2-Dataset -> well written Medium article accompanying this repo but using the EuroSAT dataset
Land Cover Classification of Satellite Imagery using Convolutional Neural Networks using Keras and a multi spectral dataset captured over vineyard fields of Salinas Valley, California
Detecting deforestation from satellite images -> using FastAI and ResNet50, with repo fsdl_deforestation_detection
Neural Network for Satellite Data Classification Using Tensorflow in Python -> A step-by-step guide for Landsat 5 multispectral data classification for binary built-up/non-built-up class prediction, with repo
Slums mapping from pretrained CNN network on VHR (Pleiades: 0.5m) and MR (Sentinel: 10m) imagery
Comparing urban environments using satellite imagery and convolutional neural networks -> includes interesting study of the image embedding features extracted for each image on the Urban Atlas dataset
RSI-CB -> A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data. See also Remote-sensing-image-classification
NAIP_PoolDetection -> modelled as an object recognition problem, a CNN is used to identify images as being swimming pools or something else - specifically a street, rooftop, or lawn
Land Use and Land Cover Classification using a ResNet Deep Learning Architecture -> uses fastai and the EuroSAT dataset
Vision Transformers Use Case: Satellite Image Classification without CNNs
WaterNet -> a CNN that identifies water in satellite images
Road-Network-Classification -> Road network classification model using ResNet-34, road classes organic, gridiron, radial and no pattern
Scaling AI to map every school on the planet
Landsat classification CNN tutorial with repo
satellite-crosswalk-classification
Understanding the Amazon Rainforest with Multi-Label Classification + VGG-19, Inceptionv3, AlexNet & Transfer Learning
Implementation of the 3D-CNN model for land cover classification -> uses the Sundarbans dataset, with repo
SSTN -> Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework
SatellitePollutionCNN -> A novel algorithm to predict air pollution levels with state-of-art accuracy using deep learning and GoogleMaps satellite images
PropertyClassification -> Classifying the type of property given Real Estate, satellite and Street view Images
remote-sense-quickstart -> classification on a number of datasets, including with attention visualization
Satellite image classification using multiple machine learning algorithms
satsense -> land use/cover classification using classical features including HoG & NDVI
PyTorch_UCMerced_LandUse
EuroSAT-image-classification
landcover_classification -> using fast.ai on EuroSAT
IGARSS2020_BWMS -> Band-Wise Multi-Scale CNN Architecture for Remote Sensing Image Scene Classification with a novel CNN architecture for the feature embedding of high-dimensional RS images
image.classification.on.EuroSAT -> solution in pure pytorch
hurricane_damage -> Post-hurricane structure damage assessment based on aerial imagery
openai-drivendata-challenge -> Using deep learning to classify the building material of rooftops (aerial imagery from South America)
is-it-abandoned -> Can we tell if a house is abandoned based on aerial LIDAR imagery?
BoulderAreaDetector -> CNN to classify whether a satellite image shows an area would be a good rock climbing spot or not
ISPRS_S2FL -> Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model
Brazilian-Coffee-Detection -> uses Keras with public dataset
tf-crash-severity -> predict the crash severity for given road features contained within satellite images
ensemble_LCLU -> Deep neural network ensembles for remote sensing land cover and land use classification
cerraNet -> contextually classify the types of use and coverage in the Brazilian Cerrado
Urban-Analysis-Using-Satellite-Imagery -> classify urban area as planned or unplanned using a combination of segmentation and classification
ChipClassification -> Deep learning for multi-modal classification of cloud, shadow and land cover scenes in PlanetScope and Sentinel-2 imagery
DeeplearningClassficationLandsat-tImages -> Water/Ice/Land Classification Using Large-Scale Medium Resolution Landsat Satellite Images
wildfire-detection-from-satellite-images-ml -> detect whether an image contains a wildfire, with example flask web app
mining-discovery-with-deep-learning -> Mining and Tailings Dam Detection in Satellite Imagery Using Deep Learning
e-Farmerce-platform -> classify crop type
sentinel2-deep-learning -> Novel Training Methodologies for Land Classification of Sentinel-2 Imagery
RSSC-transfer -> The Role of Pre-Training in High-Resolution Remote Sensing Scene Classification
Classifying Geo-Referenced Photos and Satellite Images for Supporting Terrain Classification -> detect floods
Pay-More-Attention -> Remote Sensing Image Scene Classification Based on an Enhanced Attention Module
Remote Sensing Image Classification via Improved Cross-Entropy Loss and Transfer Learning Strategy Based on Deep Convolutional Neural Networks
DenseNet40-for-HRRSISC -> DenseNet40 for remote sensing image scene classification, uses UC Merced Dataset
SKAL -> Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification
potsdam-tensorflow-practice -> image classification of Potsdam dataset using tensorflow
SAFF -> Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification
GLNET -> Convolutional Neural Networks Based Remote Sensing Scene Classification under Clear and Cloudy Environments
Remote-sensing-image-classification -> transfer learning using pytorch to classify remote sensing data into three classes: aircrafts, ships, none
remote_sensing_pretrained_models -> as an alternative to fine tuning on models pretrained on ImageNet, here some CNN are pretrained on the RSD46-WHU & AID datasets
CNN_AircraftDetection -> CNN for aircraft detection in satellite images using keras
OBIC-GCN -> Object-based Classification Framework of Remote Sensing Images with Graph Convolutional Networks
aitlas-arena -> An open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification in Earth Observation (EO)
droughtwatch -> Satellite-based Prediction of Forage Conditions for Livestock in Northern Kenya
JSTARS_2020_DPN-HRA -> Deep Prototypical Networks With Hybrid Residual Attention for Hyperspectral Image Classification
SIGNA -> Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification
Satellite Image Classification using rmldnn and Sentinel 2 data
PBDL -> Patch-Based Discriminative Learning for Remote Sensing Scene Classification
EmergencyNet -> identify fire and other emergencies from a drone
satellite-deforestation -> Using Satellite Imagery to Identify the Leading Indicators of Deforestation, applied to the Kaggle Challenge Understanding the Amazon from Space
RSMLC -> Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images
FireRisk -> A Remote Sensing Dataset for Fire Risk Assessment with Benchmarks Using Supervised and Self-supervised Learning
flood_susceptibility_mapping -> Towards urban flood susceptibility mapping using data-driven models in Berlin, Germany
tick-tick-bloom -> Winners of the Tick Tick Bloom: Harmful Algal Bloom Detection Challenge. Task was to predict severity of algae bloom, winners used decision trees
Estimating coal power plant operation from satellite images with computer vision -> use Sentinel 2 data to identify if a coal power plant is on or off, with dataset and repo
Building-detection-and-roof-type-recognition -> A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image
Performance Comparison of Multispectral Channels for Land Use Classification -> Implemented ResNet-50, ResNet-101, ResNet-152, Vision Transformer on RGB and multispectral versions of EuroSAT dataset.
SNN4Space -> project which investigates the feasibility of deploying spiking neural networks (SNN) in land cover and land use classification tasks
vessel-classification -> classify vessels and identify fishing behavior based on AIS data
RSMamba -> Remote Sensing Image Classification with State Space Model
BirdSAT -> Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping
EGNNA_WND -> Estimating the presence of the West Nile Disease employing Graph Neural network
cyfi -> Estimate cyanobacteria density based on Sentinel-2 satellite imagery
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/segmentation.png)
Image segmentation is a crucial step in image analysis and computer vision, with the goal of dividing an image into semantically meaningful segments or regions. The process of image segmentation assigns a class label to each pixel in an image, effectively transforming an image from a 2D grid of pixels into a 2D grid of pixels with assigned class labels. One common application of image segmentation is road or building segmentation, where the goal is to identify and separate roads and buildings from other features within an image. To accomplish this task, single class models are often trained to differentiate between roads and background, or buildings and background. These models are designed to recognize specific features, such as color, texture, and shape, that are characteristic of roads or buildings, and use this information to assign class labels to the pixels in an image. Another common application of image segmentation is land use or crop type classification, where the goal is to identify and map different land cover types within an image. In this case, multi-class models are typically used to recognize and differentiate between multiple classes within an image, such as forests, urban areas, and agricultural land. These models are capable of recognizing complex relationships between different land cover types, allowing for a more comprehensive understanding of the image content. Read A brief introduction to satellite image segmentation with neural networks . Note that many articles which refer to 'hyperspectral land classification' are often actually describing semantic segmentation. Image source
Segmentation - Land use & land cover
U-Net for Semantic Segmentation on Unbalanced Aerial Imagery -> using the Dubai dataset
Semantic Segmentation of Dubai dataset Using a TensorFlow U-Net Model
nga-deep-learning -> performs semantic segmentation on high resultion GeoTIF data using a modified U-Net & Keras, published by NASA researchers
Automatic Detection of Landfill Using Deep Learning
SpectralNET -> a 2D wavelet CNN for Hyperspectral Image Classification, uses Salinas Scene dataset & Keras
laika -> The goal of this repo is to research potential sources of satellite image data and to implement various algorithms for satellite image segmentation
PEARL -> a human-in-the-loop AI tool to drastically reduce the time required to produce an accurate Land Use/Land Cover (LULC) map, blog post , uses Microsoft Planetary Computer and ML models run locally in the browser. Code for backelnd and frontend
Land Cover Classification with U-Net -> Satellite Image Multi-Class Semantic Segmentation Task with PyTorch Implementation of U-Net, uses DeepGlobe Land Cover Segmentation dataset, with code
Multi-class semantic segmentation of satellite images using U-Net using DSTL dataset, tensorflow 1 & python 2.7. Accompanying article
Codebase for multi class land cover classification with U-Net accompanying a masters thesis, uses Keras
dubai-satellite-imagery-segmentation -> due to the small dataset, image augmentation was used
CDL-Segmentation -> Deep Learning Based Land Cover and Crop Type Classification: A Comparative Study. Compares UNet, SegNet & DeepLabv3+
LoveDA -> A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation
Satellite Imagery Semantic Segmentation with CNN -> 7 different segmentation classes, DeepGlobe Land Cover Classification Challenge dataset, with repo
Aerial Semantic Segmentation using U-Net Deep Learning Model medium article, with repo
UNet-Satellite-Image-Segmentation -> A Tensorflow implentation of light UNet semantic segmentation framework
DeepGlobe Land Cover Classification Challenge solution
Semantic-segmentation-with-PyTorch-Satellite-Imagery -> predict 25 classes on RGB imagery taken to assess the damage after Hurricane Harvey
Semantic Segmentation With Sentinel-2 Imagery -> uses LandCoverNet dataset and fast.ai
CNN_Enhanced_GCN -> CNN-Enhanced Graph Convolutional Network With Pixel- and Superpixel-Level Feature Fusion for Hyperspectral Image Classification
LULCMapping-WV3images-CORINE-DLMethods -> Land Use and Land Cover Mapping Using Deep Learning Based Segmentation Approaches and VHR Worldview-3 Images
MCANet -> A joint semantic segmentation framework of optical and SAR images for land use classification. Uses WHU-OPT-SAR-dataset
land-cover -> Model Generalization in Deep Learning Applications for Land Cover Mapping
generalizablersc -> Cross-dataset Learning for Generalizable Land Use Scene Classification
Large-scale-Automatic-Identification-of-Urban-Vacant-Land -> Large-scale automatic identification of urban vacant land using semantic segmentation of high-resolution remote sensing images
SSLTransformerRS -> Self-supervised Vision Transformers for Land-cover Segmentation and Classification
aerial-tile-segmentation -> Large satellite image semantic segmentation into 6 classes using Tensorflow 2.0 and ISPRS benchmark dataset
DCSA-Net -> Dynamic Convolution Self-Attention Network for Land-Cover Classification in VHR Remote-Sensing Images
CHeGCN-CNN_enhanced_Heterogeneous_Graph -> CNN-Enhanced Heterogeneous Graph Convolutional Network: Inferring Land Use from Land Cover with a Case Study of Park Segmentation
TCSVT_2022_DGSSC -> DGSSC: A Deep Generative Spectral-Spatial Classifier for Imbalanced Hyperspectral Imagery
DeepForest-Wetland-Paper -> Deep Forest classifier for wetland mapping using the combination of Sentinel-1 and Sentinel-2 data, GIScience & Remote Sensing
Wetland_UNet -> UNet models that can delineate wetlands using remote sensing data input including bands from Sentinel-2 LiDAR and geomorphons. By the Conservation Innovation Center of Chesapeake Conservancy and Defenders of Wildlife
DPA -> DPA is an unsupervised domain adaptation (UDA) method applied to different satellite images for larg-scale land cover mapping.
dynamicworld -> Dynamic World, Near real-time global 10 m land use land cover mapping
spada -> Land Cover Segmentation with Sparse Annotations from Sentinel-2 Imagery
M3SPADA -> Multi-Sensor Temporal Unsupervised Domain Adaptation for Land Cover Mapping with spatial pseudo labelling and adversarial learning
GLNet -> Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images
LoveNAS -> LoveNAS: Towards Multi-Scene Land-Cover Mapping via Hierarchical Searching Adaptive Network
FLAIR-2 challenge -> Semantic segmentation and domain adaptation challenge proposed by the French National Institute of Geographical and Forest Information (IGN)
flair-2 8th place solution
Segmentation - Water, coastlines, rivers & floods
pytorch-waterbody-segmentation -> UNET model trained on the Satellite Images of Water Bodies dataset from Kaggle. The model is deployed on Hugging Face Spaces
Flood Detection and Analysis using UNET with Resnet-34 as the back bone uses fastai
Automatic Flood Detection from Satellite Images Using Deep Learning
UNSOAT used fastai to train a Unet to perform semantic segmentation on satellite imageries to detect water
Semi-Supervised Classification and Segmentation on High Resolution Aerial Images - Solving the FloodNet problem
Houston_flooding -> labeling each pixel as either flooded or not using data from Hurricane Harvey. Dataset consisted of pre and post flood images, and a ground truth floodwater mask was created using unsupervised clustering (with DBScan) of image pixels with human cluster verification/adjustment
ml4floods -> An ecosystem of data, models and code pipelines to tackle flooding with ML
A comprehensive guide to getting started with the ETCI Flood Detection competition -> using Sentinel1 SAR & pytorch
Map Floodwater of SAR Imagery with SageMaker -> applied to Sentinel-1 dataset
1st place solution for STAC Overflow: Map Floodwater from Radar Imagery hosted by Microsoft AI for Earth -> combines Unet with Catboostclassifier, taking their maxima, not the average
hydra-floods -> an open source Python application for downloading, processing, and delivering surface water maps derived from remote sensing data
CoastSat -> tool for mapping coastlines which has an extension CoastSeg using segmentation models
Satellite_Flood_Segmentation_of_Harvey -> explores both deep learning and traditional kmeans
Flood Event Detection Utilizing Satellite Images
ETCI-2021-Competition-on-Flood-Detection -> Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training
FDSI -> Flood Detection in Satellite Images - 2017 Multimedia Satellite Task
deepwatermap -> a deep model that segments water on multispectral images
rivamap -> an automated river analysis and mapping engine
deep-water -> track changes in water level
WatNet -> A deep ConvNet for surface water mapping based on Sentinel-2 image, uses the Earth Surface Water Dataset
A-U-Net-for-Flood-Extent-Mapping
floatingobjects -> TOWARDS DETECTING FLOATING OBJECTS ON A GLOBAL SCALE WITHLEARNED SPATIAL FEATURES USING SENTINEL 2. Uses U-Net & pytorch
SpaceNet8 -> baseline Unet solution to detect flooded roads and buildings
dlsim -> Breaking the Limits of Remote Sensing by Simulation and Deep Learning for Flood and Debris Flow Mapping
Water-HRNet -> HRNet trained on Sentinel 2
semantic segmentation model to identify newly developed or flooded land using NAIP imagery provided by the Chesapeake Conservancy, training on MS Azure
BandNet -> Analysis and application of multispectral data for water segmentation using machine learning. Uses Sentinel-2 data
mmflood -> MMFlood: A Multimodal Dataset for Flood Delineation From Satellite Imagery (Sentinel 1 SAR)
Urban_flooding -> Towards transferable data-driven models to predict urban pluvial flood water depth in Berlin, Germany
Flood-Mapping-Using-Satellite-Images -> masters thesis comparing Random Forest & Unet
MECNet -> Rich CNN features for water-body segmentation from very high resolution aerial and satellite imagery
SWRNET -> A Deep Learning Approach for Small Surface Water Area Recognition Onboard Satellite
elwha-segmentation -> fine-tuning Meta's Segment Anything (SAM) for bird's eye view river pixel segmentation, with Medium article
RiverSnap -> code for paper: A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery
Segmentation - Fire, smoke & burn areas
SatelliteVu-AWS-Disaster-Response-Hackathon -> fire spread prediction using classical ML & deep learning
Wild Fire Detection using U-Net trained on Databricks & Keras, semantic segmentation
A Practical Method for High-Resolution Burned Area Monitoring Using Sentinel-2 and VIIRS
AI Geospatial Wildfire Risk Prediction -> A predictive model using geospatial raster data to asses wildfire hazard potential over the contiguous United States using Unet
IndustrialSmokePlumeDetection -> using Sentinel-2 & a modified ResNet-50
burned-area-detection -> uses Sentinel-2
rescue -> Attention to fires: multi-channel deep-learning models forwildfire severity prediction
smoke_segmentation -> Segmenting smoke plumes and predicting density from GOES imagery
wildfire-detection -> Using Vision Transformers for enhanced wildfire detection in satellite images
Burned_Area_Detection -> Detecting Burned Areas with Sentinel-2 data
burned-area-baseline -> baseline unet model accompanying the Satellite Burned Area Dataset (Sentinel 1 & 2)
burned-area-seg -> Burned area segmentation from Sentinel-2 using multi-task learning
chabud2023 -> Change detection for Burned area Delineation (ChaBuD) ECML/PKDD 2023 challenge
Post Wildfire Burnt-up Detection using Siamese-UNet -> on Chadbud dataset
vit-burned-detection -> Vision transformers in burned area delineation
Segmentation - Landslides
landslide-sar-unet -> Deep Learning for Rapid Landslide Detection using Synthetic Aperture Radar (SAR) Datacubes
landslide-mapping-with-cnn -> A new strategy to map landslides with a generalized convolutional neural network
Relict_landslides_CNN_kmeans -> Relict landslide detection in rainforest areas using a combination of k-means clustering algorithm and Deep-Learning semantic segmentation models
Landslide-mapping-on-SAR-data-by-Attention-U-Net -> Rapid Mapping of landslide on SAR data by Attention U-net
SAR-landslide-detection-pretraining -> SAR-based landslide classification pretraining leads to better segmentation
Landslide mapping from Sentinel-2 imagery through change detection
Segmentation - Glaciers
HED-UNet -> a model for simultaneous semantic segmentation and edge detection, examples provided are glacier fronts and building footprints using the Inria Aerial Image Labeling dataset
glacier_mapping -> Mapping glaciers in the Hindu Kush Himalaya, Landsat 7 images, Shapefile labels of the glaciers, Unet with dropout
glacier-detect-ML -> a simple logistic regression model to identify a glacier in Landsat satellite imagery
GlacierSemanticSegmentation
Antarctic-fracture-detection -> uses UNet with the MODIS Mosaic of Antarctica to detect surface fractures
Segmentation - Other environmental
Detection of Open Landfills -> uses Sentinel-2 to detect large changes in the Normalized Burn Ratio (NBR)
sea_ice_remote_sensing -> Sea Ice Concentration classification
Methane-detection-from-hyperspectral-imagery -> Deep Remote Sensing Methods for Methane Detection in Overhead Hyperspectral Imagery
methane-emission-project -> Classification CNNs was combined in an ensemble approach with traditional methods on tabular data
CH4Net -> A fast, simple model for detection of methane plumes using sentinel-2
EddyNet -> A Deep Neural Network For Pixel-Wise Classification of Oceanic Eddies
schisto-vegetation -> Deep Learning Segmentation of Satellite Imagery Identifies Aquatic Vegetation Associated with Snail Intermediate Hosts of Schistosomiasis in Senegal, Africa
Earthformer -> Exploring space-time transformers for earth system forecasting
weather4cast-2022 -> Unet-3D baseline model for Weather4cast Rain Movie Prediction competition
WeatherFusionNet -> Predicting Precipitation from Satellite Data. weather4cast-2022 1st place solution
marinedebrisdetector -> Large-scale Detection of Marine Debris in Coastal Areas with Sentinel-2
kaggle-identify-contrails-4th -> 4th place Solution, Google Research - Identify Contrails to Reduce Global Warming
MineSegSAT -> An automated system to evaluate mining disturbed area extents from Sentinel-2 imagery
STARCOP: Semantic Segmentation of Methane Plumes with Hyperspectral Machine Learning models
asos -> Recognizing protected and anthropogenic patterns in landscapes using interpretable machine learning and satellite imagery
Segmentation - Roads & sidewalks
Extracting roads is challenging due to the occlusions caused by other objects and the complex traffic environment
ChesapeakeRSC -> segmentation to extract roads from the background but are additionally evaluated by how they perform on the "Tree Canopy Over Road" class
Road detection using semantic segmentation and albumentations for data augmention using the Massachusetts Roads Dataset, U-net & Keras. With code
ML_EPFL_Project_2 -> U-Net in Pytorch to perform semantic segmentation of roads on satellite images
Semantic Segmentation of roads using U-net Keras, OSM data, project summary article by student, no code
Winning Solutions from SpaceNet Road Detection and Routing Challenge
RoadVecNet -> Road-Network-Segmentation-and-Vectorization in keras with dataset
Detecting road and road types jupyter notebook
awesome-deep-map -> A curated list of resources dedicated to deep learning / computer vision algorithms for mapping. The mapping problems include road network inference, building footprint extraction, etc.
RoadTracer: Automatic Extraction of Road Networks from Aerial Images -> uses an iterative search process guided by a CNN-based decision function to derive the road network graph directly from the output of the CNN
road_detection_mtl -> Road Detection using a multi-task Learning technique to improve the performance of the road detection task by incorporating prior knowledge constraints, uses the SpaceNet Roads Dataset
road_connectivity -> Improved Road Connectivity by Joint Learning of Orientation and Segmentation (CVPR2019)
Road-Network-Extraction using classical Image processing -> blur & canny edge detection
SPIN_RoadMapper -> Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving
road_extraction_remote_sensing -> pytorch implementation, CVPR2018 DeepGlobe Road Extraction Challenge submission. See also DeepGlobe-Road-Extraction-Challenge
RoadDetections dataset by Microsoft
CoANet -> Connectivity Attention Network for Road Extraction From Satellite Imagery. The CoA module incorporates graphical information to ensure the connectivity of roads are better preserved
Satellite Imagery Road Segmentation -> intro articule on Medium using the kaggle Massachusetts Roads Dataset
Label-Pixels -> for semantic segmentation of roads and other features
Satellite-image-road-extraction -> Road Extraction by Deep Residual U-Net
road_building_extraction -> Pytorch implementation of U-Net architecture for road and building extraction
RCFSNet -> Road Extraction From Satellite Imagery by Road Context and Full-Stage Feature
SGCN -> Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments From High-Resolution Remote-Sensing Images
ASPN -> Road Segmentation for Remote Sensing Images using Adversarial Spatial Pyramid Networks
FCNs-for-road-extraction-keras -> Road extraction of high-resolution remote sensing images based on various semantic segmentation networks
cresi -> Road network extraction from satellite imagery, with speed and travel time estimates
D-LinkNet -> LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction
Sat2Graph -> Road Graph Extraction through Graph-Tensor Encoding
Image-Segmentation) -> using Massachusetts Road dataset and fast.ai
RoadTracer-M -> Road Network Extraction from Satellite Images Using CNN Based Segmentation and Tracing
ScRoadExtractor -> Scribble-based Weakly Supervised Deep Learning for Road Surface Extraction from Remote Sensing Images
RoadDA -> Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images
DeepSegmentor -> A Pytorch implementation of DeepCrack and RoadNet projects
Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images
NL-LinkNet -> Toward Lighter but More Accurate Road Extraction with Non-Local Operations
IRSR-net -> Lightweight Remote Sensing Road Detection Network
hironex -> A python tool for automatic, fully unsupervised extraction of historical road networks from historical maps
Road_detection_model -> Mapping Roads in the Brazilian Amazon with Artificial Intelligence and Sentinel-2
DTnet -> Road detection via a dual-task network based on cross-layer graph fusion modules
Automatic-Road-Extraction-from-Historical-Maps-using-Deep-Learning-Techniques -> Automatic Road Extraction from Historical Maps using Deep Learning Techniques
Istanbul_Dataset -> segmentation on the Istanbul, Inria and Massachusetts datasets
Road-Segmentation -> Road segmentation on Satellite Images using CNN (U-Nets and FCN8) and Logistic Regression
D-LinkNet -> 1st place solution in DeepGlobe Road Extraction Challenge
PaRK-Detect -> PaRK-Detect: Towards Efficient Multi-Task Satellite Imagery Road Extraction via Patch-Wise Keypoints Detection
tile2net -> Mapping the walk: A scalable computer vision approach for generating sidewalk network datasets from aerial imagery
AerialLaneNet -> Building Lane-Level Maps from Aerial Images, introduces the AErial Lane (AEL) Dataset: a first large-scale aerial image dataset built for lane detection
sam_road -> Segment Anything Model (SAM) for large-scale, vectorized road network extraction from aerial imagery.
LRDNet -> A Lightweight Road Detection Algorithm Based on Multiscale Convolutional Attention Network and Coupled Decoder Head
Fine–Grained Extraction of Road Networks via Joint Learning of Connectivity and Segmentation -> uses SpaceNet 3 dataset
Segmentation - Buildings & rooftops
Road and Building Semantic Segmentation in Satellite Imagery uses U-Net on the Massachusetts Roads Dataset & keras
find unauthorized constructions using aerial photography -> Dataset creation
SRBuildSeg -> Making low-resolution satellite images reborn: a deep learning approach for super-resolution building extraction
Building footprint detection with fastai on the challenging SpaceNet7 dataset uses U-Net & fastai
Pix2Pix-for-Semantic-Segmentation-of-Satellite-Images -> using Pix2Pix GAN network to segment the building footprint from Satellite Images, uses tensorflow
SpaceNetUnet -> Baseline model is U-net like, applied to SpaceNet Vegas data, using Keras
automated-building-detection -> Input: very-high-resolution (<= 0.5 m/pixel) RGB satellite images. Output: buildings in vector format (geojson), to be used in digital map products. Built on top of robosat and robosat.pink.
project_sunroof_india -> Analyzed Google Satellite images to generate a report on individual house rooftop's solar power potential, uses a range of classical computer vision techniques (e.g Canny Edge Detection) to segment the roofs
JointNet-A-Common-Neural-Network-for-Road-and-Building-Extraction
Mapping Africa’s Buildings with Satellite Imagery: Google AI blog post . See the open-buildings dataset
nz_convnet -> A U-net based ConvNet for New Zealand imagery to classify building outlines
polycnn -> End-to-End Learning of Polygons for Remote Sensing Image Classification
spacenet_building_detection solution by motokimura using Unet
Vec2Instance -> applied to the SpaceNet challenge AOI 2 (Vegas) building footprint dataset, tensorflow v1.12
EarthquakeDamageDetection -> Buildings segmentation from satellite imagery and damage classification for each build, using Keras
Semantic-segmentation repo by fuweifu-vtoo -> uses pytorch and the Massachusetts Buildings & Roads Datasets
Extracting buildings and roads from AWS Open Data using Amazon SageMaker -> With repo
TF-SegNet -> AirNet is a segmentation network based on SegNet, but with some modifications
rgb-footprint-extract -> a Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery, DeepLavV3+ module with a Dilated ResNet C42 backbone
SpaceNetExploration -> A sample project demonstrating how to extract building footprints from satellite images using a semantic segmentation model. Data from the SpaceNet Challenge
Rooftop-Instance-Segmentation -> VGG-16, Instance Segmentation, uses the Airs dataset
solar-farms-mapping -> An Artificial Intelligence Dataset for Solar Energy Locations in India
poultry-cafos -> This repo contains code for detecting poultry barns from high-resolution aerial imagery and an accompanying dataset of predicted barns over the United States
ssai-cnn -> This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset
Remote-sensing-building-extraction-to-3D-model-using-Paddle-and-Grasshopper
segmentation-enhanced-resunet -> Urban building extraction in Daejeon region using Modified Residual U-Net (Modified ResUnet) and applying post-processing
Mask RCNN for Spacenet Off Nadir Building Detection
GRSL_BFE_MA -> Deep Learning-based Building Footprint Extraction with Missing Annotations using a novel loss function
FER-CNN -> Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks
UNET-Image-Segmentation-Satellite-Picture -> Unet to predict roof tops on Crowed AI Mapping dataset, uses keras
Vector-Map-Generation-from-Aerial-Imagery-using-Deep-Learning-GeoSpatial-UNET -> applied to geo-referenced images which are very large size > 10k x 10k pixels
building-footprint-segmentation -> pip installable library to train building footprint segmentation on satellite and aerial imagery, applied to Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset
SemSegBuildings -> Project using fast.ai framework for semantic segmentation on Inria building segmentation dataset
FCNN-example -> overfit to a given single image to detect houses
SAT2LOD2 -> an open-source, python-based GUI-enabled software that takes the satellite images as inputs and returns LoD2 building models as outputs
SatFootprint -> building segmentation on the Spacenet 7 dataset
Building-Detection -> Raster Vision experiment to train a model to detect buildings from satellite imagery in three cities in Latin America
Multi-building-tracker -> Multi-target building tracker for satellite images using deep learning
Boundary Enhancement Semantic Segmentation for Building Extraction
keras code for binary semantic segmentation
Spacenet-Building-Detection
LGPNet-BCD -> Building Change Detection for VHR Remote Sensing Images via Local-Global Pyramid Network and Cross-Task Transfer Learning Strategy
MTL_homoscedastic_SRB -> A Multi-Task Deep Learning Framework for Building Footprint Segmentation
UNet_CNN -> UNet model to segment building coverage in Boston using Remote sensing data, uses keras
FDANet -> Full-Level Domain Adaptation for Building Extraction in Very-High-Resolution Optical Remote-Sensing Images
CBRNet -> A Coarse-to-fine Boundary Refinement Network for Building Extraction from Remote Sensing Imagery
ASLNet -> Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images
BRRNet -> A Fully Convolutional Neural Network for Automatic Building Extraction From High-Resolution Remote Sensing Images
Multi-Scale-Filtering-Building-Index -> A Multi - Scale Filtering Building Index for Building Extraction in Very High - Resolution Satellite Imagery
Models for Remote Sensing -> long list of unets etc applied to building detection
boundary_loss_for_remote_sensing -> Boundary Loss for Remote Sensing Imagery Semantic Segmentation
Open Cities AI Challenge -> Segmenting Buildings for Disaster Resilience. Winning solutions on Github
MAPNet -> Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery
dual-hrnet -> localizing buildings and classifying their damage level
ESFNet -> Efficient Network for Building Extraction from High-Resolution Aerial Images
rooftop-detection-python -> Detect Rooftops from low resolution satellite images and calculate area for cultivation and solar panel installment using classical computer vision techniques
keras_segmentation_models -> Using Open Vector-Based Spatial Data to Create Semantic Datasets for Building Segmentation for Raster Data
CVCMFFNet -> Complex-Valued Convolutional and Multifeature Fusion Network for Building Semantic Segmentation of InSAR Images
STEB-UNet -> A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction
dfc2020_baseline -> Baseline solution for the IEEE GRSS Data Fusion Contest 2020. Predict land cover labels from Sentinel-1 and Sentinel-2 imagery
Fusing multiple segmentation models based on different datasets into a single edge-deployable model -> roof, car & road segmentation
ground-truth-gan-segmentation -> use Pix2Pix to segment the footprint of a building. The dataset used is AIRS
UNICEF-Giga_Sudan -> Detecting school lots from satellite imagery in Southern Sudan using a UNET segmentation model
building_footprint_extraction -> The project retrieves satellite imagery from Google and performs building footprint extraction using a U-Net.
projectRegularization -> Regularization of building boundaries in satellite images using adversarial and regularized losses
PolyWorldPretrainedNetwork -> Polygonal Building Extraction with Graph Neural Networks in Satellite Images
dl_image_segmentation -> Uncertainty-Aware Interpretable Deep Learning for Slum Mapping and Monitoring. Uses SHAP
UBC-dataset -> a dataset for building detection and classification from very high-resolution satellite imagery with the focus on object-level interpretation of individual buildings
UNetFormer -> A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery
BES-Net -> Boundary Enhancing Semantic Context Network for High-Resolution Image Semantic Segmentation. Applied to Vaihingen and Potsdam datasets
CVNet -> Contour Vibration Network for Building Extraction
CFENet -> A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery
HiSup -> Accurate Polygonal Mapping of Buildings in Satellite Imagery
BuildingExtraction -> Building Extraction from Remote Sensing Images with Sparse Token Transformers
CrossGeoNet -> A Framework for Building Footprint Generation of Label-Scarce Geographical Regions
AFM_building -> Building Footprint Generation Through Convolutional Neural Networks With Attraction Field Representation
RAMP (Replicable AI for MicroPlanning) -> building detection in low and middle income countries
Building-instance-segmentation -> Multi-Modal Feature Fusion Network with Adaptive Center Point Detector for Building Instance Extraction
CGSANet -> A Contour-Guided and Local Structure-Aware Encoder–Decoder Network for Accurate Building Extraction From Very High-Resolution Remote Sensing Imagery
building-footprints-update -> Learning Color Distributions from Bitemporal Remote Sensing Images to Update Existing Building Footprints
RAMP -> model and buildings dataset to support a wide variety of humanitarian use cases
Thesis_Semantic_Image_Segmentation_on_Satellite_Imagery_using_UNets -> This master thesis aims to perform semantic segmentation of buildings on satellite images from the SpaceNet challenge 1 dataset using the U-Net architecture
HD-Net -> High-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition
RoofSense -> A novel deep learning solution for the automatic roofing material classification of the Dutch building stock using aerial imagery and laser scanning data fusion
IBS-AQSNet -> Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery
DeepMAO -> Deep Multi-scale Aware Overcomplete Network for Building Segmentation in Satellite Imagery
Segmentation - Solar panels
Deep-Learning-for-Solar-Panel-Recognition -> using both object detection with Yolov5 and Unet segmentation
DeepSolar -> A Machine Learning Framework to Efficiently Construct a Solar Deployment Database in the United States. Dataset on kaggle , actually used a CNN for classification and segmentation is obtained by applying a threshold to the activation map. Original code is tf1 but tf2/kers and a pytorch implementation are available. Also checkout Visualizations and in-depth analysis .. of the factors that can explain the adoption of solar energy in .. Virginia and DeepSolar tracker: towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mapping
hyperion_solar_net -> trained classificaton & segmentation models on RGB imagery from Google Maps
3D-PV-Locator -> Large-scale detection of rooftop-mounted photovoltaic systems in 3D
PV_Pipeline -> DeepSolar for Germany
solar-panels-detection -> using SegNet, Fast SCNN & ResNet
predict_pv_yield -> Using optical flow & machine learning to predict PV yield
Large-scale-solar-plant-monitoring -> Remote Sensing for Monitoring of Photovoltaic Power Plants in Brazil Using Deep Semantic Segmentation
Panel-Segmentation -> Determine the presence of a solar array in the satellite image (boolean True/False), using a VGG16 classification model
Roofpedia -> an open registry of green roofs and solar roofs across the globe identified by Roofpedia through deep learning
Predicting the Solar Potential of Rooftops using Image Segmentation and Structured Data Medium article, using 20cm imagery & Unet
solar-pv-global-inventory
remote-sensing-solar-pv -> A repository for sharing progress on the automated detection of solar PV arrays in sentinel-2 remote sensing imagery
solar-panel-segmentation) -> Finding solar panels using USGS satellite imagery
solar_seg -> Solar segmentation of PV modules (sub elements of panels) using drone images and fast.ai
solar_plant_detection -> boundary extraction of Photovoltaic (PV) plants using Mask RCNN and Amir dataset
SolarDetection -> unet on satellite image from the USA and France
adopptrs -> Automatic Detection Of Photovoltaic Panels Through Remote Sensing using unet & pytorch
solar-panel-locator -> the number of solar panel pixels was only ~0.2% of the total pixels in the dataset, so solar panel data was upsampled to account for the class imbalance
projects-solar-panel-detection -> List of project to detect solar panels from aerial/satellite images
Satellite_ComputerVision -> UNET to detect solar arrays from Sentinel-2 data, using Google Earth Engine and Tensorflow. Also covers parking lot detection
photovoltaic-detection -> Detecting available rooftop area from satellite images to install photovoltaic panels
Solar_UNet -> U-Net models delineating solar arrays in Sentinel-2 imagery
SolarDetection-solafune -> Solar Panel Detection Using Sentinel-2 for the Solafune Competition
A Comparative Evaluation of Deep Learning Techniques for Photovoltaic Panel Detection from Aerial Images
Segmentation - Ships & vessels
Universal-segmentation-baseline-Kaggle-Airbus-Ship-Detection -> Kaggle Airbus Ship Detection Challenge - bronze medal solution
Airbus-Ship-Segmentation -> unet
contrastive_SSL_ship_detection -> Contrastive self supervised learning for ship detection in Sentinel 2 images
airbus-ship-detection -> using DeepLabV3+
Unet with web-application applied to Airbus ships
Segmentation - Other manmade
Aarsh2001/ML_Challenge_NRSC -> Electrical Substation detection
electrical_substation_detection
MCAN-OilSpillDetection -> Oil Spill Detection with A Multiscale Conditional Adversarial Network under Small Data Training
mining-detector -> detection of artisanal gold mines in Sentinel-2 satellite imagery for Amazon Mining Watch . Also covers clandestine airstrips
EG-UNet Deep Feature Enhancement Method for Land Cover With Irregular and Sparse Spatial Distribution Features: A Case Study on Open-Pit Mining
plastics -> Detecting and Monitoring Plastic Waste Aggregations in Sentinel-2 Imagery
MADOS -> Detecting Marine Pollutants and Sea Surface Features with Deep Learning in Sentinel-2 Imagery on the MADOS dataset
SADMA -> Residual Attention UNet on MARIDA: Marine Debris Archive is a marine debris-oriented dataset on Sentinel-2 satellite images
MAP-Mapper -> Marine Plastic Mapper is a tool for assessing marine macro-plastic density to identify plastic hotspots, underpinned by the MARIDA dataset.
Panoptic segmentation
Things and stuff or how remote sensing could benefit from panoptic segmentation
utae-paps -> PyTorch implementation of U-TAE and PaPs for satellite image time series panoptic segmentation
pastis-benchmark
Panoptic-Generator -> This module converts GIS data into panoptic segmentation tiles
BSB-Aerial-Dataset -> an example on how to use Detectron2's Panoptic-FPN in the BSB Aerial Dataset
Segmentation - Miscellaneous
seg-eval -> SegEval is a Python library that provides tools for evaluating semantic segmentation models. Generate evaluation regions and to analyze segmentation results within them.
awesome-satellite-images-segmentation
Satellite Image Segmentation: a Workflow with U-Net is a decent intro article
mmsegmentation -> Semantic Segmentation Toolbox with support for many remote sensing datasets including LoveDA, Potsdam, Vaihingen & iSAID
segmentation_gym -> A neural gym for training deep learning models to carry out geoscientific image segmentation
How to create a DataBlock for Multispectral Satellite Image Semantic Segmentation using Fastai
Using a U-Net for image segmentation, blending predicted patches smoothly is a must to please the human eye -> python code to blend predicted patches smoothly. See Satellite-Image-Segmentation-with-Smooth-Blending
DCA -> Deep Covariance Alignment for Domain Adaptive Remote Sensing Image Segmentation
SCAttNet -> Semantic Segmentation Network with Spatial and Channel Attention Mechanism
unetseg -> A set of classes and CLI tools for training a semantic segmentation model based on the U-Net architecture, using Tensorflow and Keras. This implementation is tuned specifically for satellite imagery and other geospatial raster data
Semantic Segmentation of Satellite Imagery using U-Net & fast.ai -> with repo
clusternet_segmentation -> Unsupervised Segmentation by applying K-Means clustering to the features generated by Neural Network
Efficient-Transformer -> Efficient Transformer for Remote Sensing Image Segmentation
weakly_supervised -> Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery
HRCNet-High-Resolution-Context-Extraction-Network -> High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images
Semantic segmentation of SAR images using a self supervised technique
satellite-segmentation-pytorch -> explores a wide variety of image augmentations to increase training dataset size
Spectralformer -> Rethinking hyperspectral image classification with transformers
Unsupervised Segmentation of Hyperspectral Remote Sensing Images with Superpixels
Semantic-Segmentation-with-Sparse-Labels
SNDF -> Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation
Satellite-Image-Classification -> using random forest or support vector machines (SVM) and sklearn
dynamic-rs-segmentation -> Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks
2023GRIC -> Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming
segmentation_models.pytorch -> Segmentation models with pretrained backbones, has been used in multiple winning solutions to remote sensing competitions
SSRN -> Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework
SO-DNN -> Simplified object-based deep neural network for very high resolution remote sensing image classification
SANet -> Scale-Aware Network for Semantic Segmentation of High-Resolution Aerial Images
aerial-segmentation -> Learning Aerial Image Segmentation from Online Maps
IterativeSegmentation -> Recurrent Neural Networks to Correct Satellite Image Classification Maps
Detectron2 FPN + PointRend Model for amazing Satellite Image Segmentation -> 15% increase in accuracy when compared to the U-Net model
HybridSN -> Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification
TNNLS_2022_X-GPN -> Semisupervised Cross-scale Graph Prototypical Network for Hyperspectral Image Classification
singleSceneSemSegTgrs2022 -> Unsupervised Single-Scene Semantic Segmentation for Earth Observation
A-Fast-and-Compact-3-D-CNN-for-HSIC -> A Fast and Compact 3-D CNN for Hyperspectral Image Classification
HSNRS -> Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery
GiGCN -> Graph-in-Graph Convolutional Network for Hyperspectral Image Classification
SSAN -> Spectral-Spatial Attention Networks for Hyperspectral Image Classification
drone-images-semantic-segmentation -> Multiclass Semantic Segmentation of Aerial Drone Images Using Deep Learning
Satellite-Image-Segmentation-with-Smooth-Blending -> uses Smoothly-Blend-Image-Patches
BayesianUNet -> Pytorch Bayesian UNet model for segmentation and uncertainty prediction, applied to the Potsdam Dataset
RAANet -> A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images
wheelRuts_semanticSegmentation -> Mapping wheel-ruts from timber harvesting operations using deep learning techniques in drone imagery
LWN-for-UAVRSI -> Light-Weight Semantic Segmentation Network for UAV Remote Sensing Images, applied to Vaihingen, UAVid and UDD6 datasets
hypernet -> library which implements hyperspectral image (HSI) segmentation
ST-UNet -> Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation
EDFT -> Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation
WiCoNet -> Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images
CRGNet -> Consistency-Regularized Region-Growing Network for Semantic Segmentation of Urban Scenes with Point-Level Annotations
SA-UNet -> Improved U-Net Remote Sensing Classification Algorithm Fusing Attention and Multiscale Features
MANet -> Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images
BANet -> Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images
MACU-Net -> MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed Images
DNAS -> Decoupling Neural Architecture Search for High-Resolution Remote Sensing Image Semantic Segmentation
A2-FPN -> A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images
MAResU-Net -> Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images
ml_segmentation -> semantic segmentation of buildings using Random Forest, Support Vector Machine (SVM) & Gradient Boosting Classifier (GBC)
RSEN -> Robust Self-Ensembling Network for Hyperspectral Image Classification
MSNet -> multispectral semantic segmentation network for remote sensing images
k-textures -> K-textures, a self-supervised hard clustering deep learning algorithm for satellite image segmentation
Swin-Transformer-Semantic-Segmentation -> Satellite Image Semantic Segmentation
UDA_for_RS -> Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation with Transformer
A-3D-CNN-AM-DSC-model-for-hyperspectral-image-classification -> Attention Mechanism and Depthwise Separable Convolution Aided 3DCNN for Hyperspectral Remote Sensing Image Classification
contrastive-distillation -> A Contrastive Distillation Approach for Incremental Semantic Segmentation in Aerial Images
SegForestNet -> SegForestNet: Spatial-Partitioning-Based Aerial Image Segmentation
MFVNet -> MFVNet: Deep Adaptive Fusion Network with Multiple Field-of-Views for Remote Sensing Image Semantic Segmentation
Wildebeest-UNet -> detecting wildebeest and zebras in Serengeti-Mara ecosystem from very-high-resolution satellite imagery
segment-anything-eo -> Earth observation tools for Meta AI Segment Anything (SAM - Segment Anything Model)
HR-Image-classification_SDF2N -> A Shallow-to-Deep Feature Fusion Network for VHR Remote Sensing Image Classification
sink-seg -> Automatic Segmentation of Sinkholes Using a Convolutional Neural Network
Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations
EMRT -> Enhancing Multiscale Representations With Transformer for Remote Sensing Image Semantic Segmentation
CMTFNet -> CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote Sensing Image Semantic Segmentation
CM-UNet -> Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation
Using Stable Diffusion to Improve Image Segmentation Models -> Augmenting Data with Stable Diffusion
SSRS -> Semantic Segmentation for Remote Sensing, multiple networks implemented
Instance segmentation
In instance segmentation, each individual 'instance' of a segmented area is given a unique lable. For detection of very small objects this may a good approach, but it can struggle seperating individual objects that are closely spaced.
Mask_RCNN generates bounding boxes and segmentation masks for each instance of an object in the image. It is very commonly used for instance segmentation & object detection
Instance segmentation of center pivot irrigation system in Brazil using free Landsat images, mask R-CNN & Keras
Building-Detection-MaskRCNN -> Building detection from the SpaceNet dataset by using Mask RCNN
Oil tank instance segmentation with Mask R-CNN with accompanying article using Keras & Airbus Oil Storage Detection Dataset on Kaggle
Mask_RCNN-for-Caravans -> detect caravan footprints from OS imagery
parking_bays_detectron2 -> Detecting parking bays with satellite imagery. Used Detectron2 and synthetic data with Unreal, superior performance to using Mask RCNN
Locate buildings with a dark roof that feed heat island phenomenon using Mask RCNN -> with repo , used INRIA dataset & labelme for annotation
Circle_Finder -> Circular Shapes Detection in Satellite Imagery, 2nd place solution to the Circle Finder Challenge
Lawn_maskRCNN -> Detecting lawns from satellite images of properties in the Cedar Rapids area using Mask-R-CNN
CropMask_RCNN -> Segmenting center pivot agriculture to monitor crop water use in drylands with Mask R-CNN and Landsat satellite imagery
CATNet -> Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images
Object-Detection-on-Satellite-Images-using-Mask-R-CNN -> detect ships
FactSeg -> Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS), also see FarSeg and FreeNet , implementations of research paper
aqua_python -> detecting aquaculture farms using Mask R-CNN
RSPrompter -> Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/object-detection.png)
Object detection in remote sensing involves locating and surrounding objects of interest with bounding boxes. Due to the large size of remote sensing images and the fact that objects may only comprise a few pixels, object detection can be challenging in this context. The imbalance between the area of the objects to be detected and the background, combined with the potential for objects to be easily confused with random features in the background, further complicates the task. Object detection generally performs better on larger objects, but becomes increasingly difficult as the objects become smaller and more densely packed. The accuracy of object detection models can also degrade rapidly as image resolution decreases, which is why it is common to use high resolution imagery, such as 30cm RGB, for object detection in remote sensing. A unique characteristic of aerial images is that objects can be oriented in any direction. To effectively extract measurements of the length and width of an object, it can be crucial to use rotated bounding boxes that align with the orientation of the object. This approach enables more accurate and meaningful analysis of the objects within the image. Image source
Object tracking in videos
TCTrack -> Temporal Contexts for Aerial Tracking
CFME -> Object Tracking in Satellite Videos by Improved Correlation Filters With Motion Estimations
TGraM -> Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling
satellite_video_mod_groundtruth -> groundtruth on satellite video for evaluating moving object detection algorithm
Moving-object-detection-DSFNet -> DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos
HiFT -> Hierarchical Feature Transformer for Aerial Tracking
Object detection with rotated bounding boxes
Orinted bounding boxes (OBB) are polygons representing rotated rectangles. For datasets checkout DOTA & HRSC2016. Start with Yolov8
mmrotate -> Rotated Object Detection Benchmark, with pretrained models and function for inferencing on very large images
OBBDetection -> an oriented object detection library, which is based on MMdetection
rotate-yolov3 -> Rotation object detection implemented with yolov3. Also see yolov3-polygon
DRBox -> for detection tasks where the objects are orientated arbitrarily, e.g. vehicles, ships and airplanes
s2anet -> Align Deep Features for Oriented Object Detection
CFC-Net -> A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images
ReDet -> A Rotation-equivariant Detector for Aerial Object Detection
BBAVectors-Oriented-Object-Detection -> Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors
CSL_RetinaNet_Tensorflow -> Arbitrary-Oriented Object Detection with Circular Smooth Label
r3det-on-mmdetection -> R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object
R-DFPN_FPN_Tensorflow -> Rotation Dense Feature Pyramid Networks (Tensorflow)
R2CNN_Faster-RCNN_Tensorflow -> Rotational region detection based on Faster-RCNN
Rotated-RetinaNet -> implemented in pytorch, it supports the following datasets: DOTA, HRSC2016, ICDAR2013, ICDAR2015, UCAS-AOD, NWPU VHR-10, VOC2007
OBBDet_Swin -> The sixth place winning solution in 2021 Gaofen Challenge
CG-Net -> Learning Calibrated-Guidance for Object Detection in Aerial Images
OrientedRepPoints_DOTA -> Oriented RepPoints + Swin Transformer/ReResNet
yolov5_obb -> yolov5 + Oriented Object Detection
How to Train YOLOv5 OBB -> YOLOv5 OBB tutorial and YOLOv5 OBB noteboook
OHDet_Tensorflow -> can be applied to rotation detection and object heading detection
Seodore -> framework maintaining recent updates of mmdetection
Rotation-RetinaNet-PyTorch -> oriented detector Rotation-RetinaNet implementation on Optical and SAR ship dataset
AIDet -> an open source object detection in aerial image toolbox based on MMDetection
rotation-yolov5 -> rotation detection based on yolov5
ShipDetection -> Ship Detection in HR Optical Remote Sensing Images via Rotated Bounding Box, based on Faster R-CNN and ORN, uses caffe
SLRDet -> project based on mmdetection to reimplement RRPN and use the model Faster R-CNN OBB
AxisLearning -> Axis Learning for Orientated Objects Detection in Aerial Images
Detection_and_Recognition_in_Remote_Sensing_Image -> This work uses PaNet to realize Detection and Recognition in Remote Sensing Image by MXNet
DrBox-v2-tensorflow -> tensorflow implementation of DrBox-v2 which is an improved detector with rotatable boxes for target detection in remote sensing images
Rotation-EfficientDet-D0 -> A PyTorch Implementation Rotation Detector based EfficientDet Detector, applied to custom rotation vehicle datasets
DODet -> Dual alignment for oriented object detection, uses DOTA dataset
GF-CSL -> Gaussian Focal Loss: Learning Distribution Polarized Angle Prediction for Rotated Object Detection in Aerial Images
simplified_rbox_cnn -> RBox-CNN: rotated bounding box based CNN for ship detection in remote sensing image. Uses Tensorflow object detection API
Polar-Encodings -> Learning Polar Encodings for Arbitrary-Oriented Ship Detection in SAR Images
R-CenterNet -> detector for rotated-object based on CenterNet
piou -> Orientated Object Detection; IoU Loss, applied to DOTA dataset
DAFNe -> A One-Stage Anchor-Free Approach for Oriented Object Detection
AProNet -> Detecting objects with precise orientation from aerial images. Applied to datasets DOTA and HRSC2016
UCAS-AOD-benchmark -> A benchmark of UCAS-AOD dataset
RotateObjectDetection -> based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Also see PolygonObjectDetection
AD-Toolbox -> Aerial Detection Toolbox based on MMDetection and MMRotate, with support for more datasets
GGHL -> A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection
NPMMR-Det -> A Novel Nonlocal-Aware Pyramid and Multiscale Multitask Refinement Detector for Object Detection in Remote Sensing Images
AOPG -> Anchor-Free Oriented Proposal Generator for Object Detection
SE2-Det -> Semantic-Edge-Supervised Single-Stage Detector for Oriented Object Detection in Remote Sensing Imagery
OrientedRepPoints -> Oriented RepPoints for Aerial Object Detection
TS-Conv -> Task-wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial Images
FCOSR -> A Simple Anchor-free Rotated Detector for Aerial Object Detection. This implement is modified from mmdetection. See also TensorRT_Inference
OBB_Detection -> Finalist's solution in the track of Oriented Object Detection in Remote Sensing Images, 2022 Guangdong-Hong Kong-Macao Greater Bay Area International Algorithm Competition
sam-mmrotate -> SAM (Segment Anything Model) for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2
mmrotate-dcfl -> Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection
h2rbox-mmrotate -> Horizontal Box Annotation is All You Need for Oriented Object Detection
Spatial-Transform-Decoupling -> Spatial Transform Decoupling for Oriented Object Detection
ARS-DETR -> Aspect Ratio Sensitive Oriented Object Detection with Transformer
CFINet -> Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning. Introduces SODA-A dataset
Object detection enhanced by super resolution
Super-Resolution and Object Detection -> Super-resolution is a relatively inexpensive enhancement that can improve object detection performance
EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
Mid-Low Resolution Remote Sensing Ship Detection Using Super-Resolved Feature Representation
EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Applied to COWC & OGST datasets
FBNet -> Feature Balance for Fine-Grained Object Classification in Aerial Images
SuperYOLO -> SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery
Salient object detection
Detecting the most noticeable or important object in a scene
ACCoNet -> Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images
MCCNet -> Multi-Content Complementation Network for Salient Object Detection in Optical Remote Sensing Images
CorrNet -> Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation
Reading list for deep learning based Salient Object Detection in Optical Remote Sensing Images
ORSSD-dataset -> salient object detection dataset
EORSSD-dataset -> Extended Optical Remote Sensing Saliency Detection (EORSSD) Dataset
DAFNet_TIP20 -> Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images
EMFINet -> Edge-Aware Multiscale Feature Integration Network for Salient Object Detection in Optical Remote Sensing Images
ERPNet -> Edge-guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images
FSMINet -> Fully Squeezed Multi-Scale Inference Network for Fast and Accurate Saliency Detection in Optical Remote Sensing Images
AGNet -> AGNet: Attention Guided Network for Salient Object Detection in Optical Remote Sensing Images
MSCNet -> A lightweight multi-scale context network for salient object detection in optical remote sensing images
GPnet -> Global Perception Network for Salient Object Detection in Remote Sensing Images
SeaNet -> Lightweight Salient Object Detection in Optical Remote Sensing Images via Semantic Matching and Edge Alignment
GeleNet -> Salient Object Detection in Optical Remote Sensing Images Driven by Transformer
Object detection - Buildings, rooftops & solar panels
satellite_image_tinhouse_detector -> Detection of tin houses from satellite/aerial images using the Tensorflow Object Detection API
Machine Learning For Rooftop Detection and Solar Panel Installment discusses tiling large images and generating annotations from OSM data. Features of the roofs were calculated using a combination of contour detection and classification. Follow up article using semantic segmentation
Building Extraction with YOLT2 and SpaceNet Data
XBD-hurricanes -> Models for building (and building damage) detection in high-resolution (<1m) satellite and aerial imagery using a modified RetinaNet model
Detecting solar panels from satellite imagery using segmentation
ssd-spacenet -> Detect buildings in the Spacenet dataset using Single Shot MultiBox Detector (SSD)
3DBuildingInfoMap -> simultaneous extraction of building height and footprint from Sentinel imagery using ResNet
DeepSolaris -> a EuroStat project to detect solar panels in aerial images, further material here
ML_ObjectDetection_CAFO -> Detect Concentrated Animal Feeding Operations (CAFO) in Satellite Imagery
Multi-level-Building-Detection-Framework -> Multilevel Building Detection Framework in Remote Sensing Images Based on Convolutional Neural Networks
Automatic Damage Annotation on Post-Hurricane Satellite Imagery -> detect damaged buildings using tensorflow object detection API. With repos here and here
mappingchallenge -> YOLOv5 applied to the AICrowd Mapping Challenge dataset
Object detection - Ships, boats, vessels & wake
Airbus Ship Detection Challenge -> using oriented bounding boxes. Read Detecting ships in satellite imagery: five years later…
kaggle-ships-in-Google-Earth-yolov8 -> Applying YOLOv8 to Kaggle Ships in Google Earth dataset
How hard is it for an AI to detect ships on satellite images?
Object Detection in Satellite Imagery, a Low Overhead Approach
Detecting Ships in Satellite Imagery using the Planet dataset and Keras
Ship detection using k-means clustering & CNN classifier on patches
SARfish -> Ship detection in Sentinel 1 Synthetic Aperture Radar (SAR) imagery
Arbitrary-Oriented Ship Detection through Center-Head Point Extraction
ship_detection -> using an interesting combination of CNN classifier, Class Activation Mapping (CAM) & UNET segmentation
Building a complete Ship detection algorithm using YOLOv3 and Planet satellite images -> covers finding and annotating data (using LabelMe), preprocessing large images into chips, and training Yolov3. Repo
Ship-detection-in-satellite-images -> experiments with UNET, YOLO, Mask R-CNN, SSD, Faster R-CNN, RETINA-NET
Ship-Detection-from-Satellite-Images-using-YOLOV4 -> uses Kaggle Airbus Ship Detection dataset
shipsnet-detector -> Detect container ships in Planet imagery using machine learning
Classifying Ships in Satellite Imagery with Neural Networks -> applied to the Kaggle Ships in Satellite Imagery dataset
Mask R-CNN for Ship Detection & Segmentation blog post with repo
Boat detection with multi-region-growing method in satellite images
small-boat-detector -> Trained yolo v3 model weights and configuration file to detect small boats in satellite imagery
Satellite-Imagery-Datasets-Containing-Ships -> A list of optical and radar satellite datasets for ship detection, classification, semantic segmentation and instance segmentation tasks
vessel-detection-sentinels -> Sentinel-1 and Sentinel-2 Vessel Detection
Ship-Detection -> CNN approach for ship detection in the ocean using a satellite image
vesselTracker -> Project based on reduced model of Yolov5 architecture using Pytorch. Custom dataset based on SAR imagery provided by Sentinel-1 through Earth Engine API
marine-debris-ml-model -> Marine Debris Detection using tensorflow object detection API
SDGH-Net -> Ship Detection in Optical Remote Sensing Images Based on Gaussian Heatmap Regression
LR-TSDet -> LR-TSDet: Towards Tiny Ship Detection in Low-Resolution Remote Sensing Images
FGSCR-42 -> A public Dataset for Fine-Grained Ship Classification in Remote sensing images
WakeNet -> Rethinking Automatic Ship Wake Detection: State-of-the-Art CNN-based Wake Detection via Optical Images
Histogram of Oriented Gradients (HOG) Boat Heading Classification
Object Detection in Satellite Imagery, a Low Overhead Approach -> Medium article which demonstrates how to combine Canny edge detector pre-filters with HOG feature descriptors, random forest classifiers, and sliding windows to perform ship detection
Ship-Detection-based-on-YOLOv3-and-KV260 -> entry project of the Xilinx Adaptive Computing Challenge 2021. It uses YOLOv3 for ship target detection in optical remote sensing images, and deploys DPU on the KV260 platform to achieve hardware acceleration
LEVIR-Ship -> a dataset for tiny ship detection under medium-resolution remote sensing images
Push-and-Pull-Network -> Contrastive Learning for Fine-grained Ship Classification in Remote Sensing Images
DRENet -> A Degraded Reconstruction Enhancement-Based Method for Tiny Ship Detection in Remote Sensing Images With a New Large-Scale Dataset
xView3-The-First-Place-Solution -> A winning solution for xView 3 challenge (Vessel detection, classification and length estimation on Sentinetl-1 images). Contains trained models, inference pipeline and training code & configs to reproduce the results.
vessel-detection-viirs -> Model and service code for streaming vessel detections from VIIRS satellite imagery
wakemodel_llmassist -> wake detection in Sentinel-2, uses an EfficientNet-B0 architecture adapted for keypoint detection
ORFENet -> Tiny Object Detection in Remote Sensing Images Based on Object Reconstruction and Multiple Receptive Field Adaptive Feature Enhancement. Uses LEVIR-Ship & AI-TODv2 datasets
Object detection - Cars, vehicles & trains
Detection of parkinglots and driveways with retinanet
pytorch-vedai -> object detection on the VEDAI dataset: Vehicle Detection in Aerial Imagery
Truck Detection with Sentinel-2 during COVID-19 crisis -> moving objects in Sentinel-2 data causes a specific reflectance relationship in the RGB, which looks like a rainbow, and serves as a marker for trucks. Improve accuracy by only analysing roads. Not using object detection but relevant. Also see S2TD
cowc_car_counting -> car counting on the Cars Overhead With Context (COWC) dataset . Not sctictly object detection but a CNN to predict the car count in a tile
CarCounting -> using Yolov3 & COWC dataset
Traffic density estimation as a regression problem instead of object detection
Rotation-EfficientDet-D0 -> PyTorch implementation of Rotated EfficientDet, applied to a custom rotation vehicle dataset (car counting)
RSVC2021-Dataset -> A dataset for Vehicle Counting in Remote Sensing images, created from the DOTA & ITCVD
Car Localization and Counting with Overhead Imagery, an Interactive Exploration -> Medium article by Adam Van Etten
Vehicle-Counting-in-Very-Low-Resolution-Aerial-Images -> Vehicle Counting in Very Low-Resolution Aerial Images via Cross-Resolution Spatial Consistency and Intraresolution Time Continuity
Vehicle Detection blog post by Grant Pearse: detecting vehicles across New Zealand without collecting local training data
detecting-trucks -> detecting large vehicles in Sentinel-2
Object detection - Planes & aircraft
FlightScope_Bench -> A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery, including Faster RCNN, DETR, SSD, RTMdet, RetinaNet, CenterNet, YOLOv5, and YOLOv8
Faster RCNN to detect airplanes
yoltv4 includes examples on the RarePlanes dataset
aircraft-detection -> experiments to test the performance of a Gaussian process (GP) classifier with various kernels on the UC Merced land use land cover (LULC) dataset
aircraft-detection-from-satellite-images-yolov3 -> trained on kaggle cgi-planes-in-satellite-imagery-w-bboxes dataset
HRPlanesv2-Data-Set -> YOLOv4 and YOLOv5 weights trained on the HRPlanesv2 dataset
Deep-Learning-for-Aircraft-Recognition -> A CNN model trained to classify and identify various military aircraft through satellite imagery
FRCNN-for-Aircraft-Detection
ergo-planes-detector -> An ergo based project that relies on a convolutional neural network to detect airplanes from satellite imagery, uses the PlanesNet dataset
pytorch-remote-sensing -> Aircraft detection using the 'Airbus Aircraft Detection' dataset and Faster-RCNN with ResNet-50 backbone using pytorch
FasterRCNN_ObjectDetection -> faster RCNN model for aircraft detection and localisation in satellite images and creating a webpage with live server for public usage
HRPlanes -> weights of YOLOv4 and Faster R-CNN networks trained with HRPlanes dataset
aerial-detection -> uses Yolov5 & Icevision
How to choose a deep learning architecture to detect aircrafts in satellite imagery?
rareplanes-yolov5 -> using YOLOv5 and the RarePlanes dataset to detect and classify sub-characteristics of aircraft, with article
OnlyPlanes -> Incrementally Tuning Synthetic Training Datasets for Satellite Object Detection
Understanding the RarePlanes Dataset and Building an Aircraft Detection Model -> blog post
Object detection - Infrastructure & utilities
wind-turbine-detector -> Wind Turbine Object Detection from Aerial Imagery Using TensorFlow Object Detection API
Water Tanks and Swimming Pools Detection -> uses Faster R-CNN
PCAN -> Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery, with dataset
WindTurbineDetection -> Implementation of transfer learning approach using the YOLOv7 framework to detect and rapidly quantify wind turbines in raw LANDSAT and NAIP satellite imagery
Object detection - Oil storage tank detection
Oil is stored in tanks at many points between extraction and sale, and the volume of oil in storage is an important economic indicator.
A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery
Oil Storage Tank’s Volume Occupancy On Satellite Imagery Using YoloV3 with repo
Oil-Tank-Volume-Estimation -> combines object detection and classical computer vision
SubpixelCircleDetection -> CIRCULAR-SHAPED OBJECT DETECTION IN LOW RESOLUTION SATELLITE IMAGES
oil_storage-detector -> using yolov5 and the Airbus Oil Storage Detection dataset
oil_well_detector -> detect oil wells in the Bakken oil field based on satellite imagery
Oil Storage Detection on Airbus Imagery with YOLOX -> uses the Kaggle Airbus Oil Storage Detection dataset
AContrarioTankDetection -> Oil Tank Detection in Satellite Images via a Contrario Clustering
Object detection - Animals
A variety of techniques can be used to count animals, including object detection and instance segmentation. For convenience they are all listed here:
cownter_strike -> counting cows, located with point-annotations, two models: CSRNet (a density-based method) & LCFCN (a detection-based method)
elephant_detection -> Using Keras-Retinanet to detect elephants from aerial images
CNN-Mosquito-Detection -> determining the locations of potentially dangerous breeding grounds, compared YOLOv4, YOLOR & YOLOv5
Borowicz_etal_Spacewhale -> locate whales using ResNet
walrus-detection-and-count -> uses Mask R-CNN instance segmentation
MarineMammalsDetection -> Weakly Supervised Detection of Marine Animals in High Resolution Aerial Images
Audubon_F21 -> Deep object detection for waterbird monitoring using aerial imagery
![](http://myjudaica.online/777/templates/cheerup2/res/banner1.gif)
Object detection - Miscellaneous
Object detection on Satellite Imagery using RetinaNet -> using the Kaggle Swimming Pool and Car Detection dataset
Tackling the Small Object Problem in Object Detection
Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review
awesome-aerial-object-detection bu murari023 , another by visionxiang and awesome-tiny-object-detection list many relevant papers
Object Detection Accuracy as a Function of Image Resolution -> Medium article using COWC dataset, performance rapidly degrades below 30cm imagery
Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) -> combines some of the leading object detection algorithms into a unified framework designed to detect objects both large and small in overhead imagery. Train models and test on arbitrary image sizes with YOLO (versions 2 and 3), Faster R-CNN, SSD, or R-FCN.
YOLTv4 -> YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Read Announcing YOLTv4: Improved Satellite Imagery Object Detection
Tensorflow Benchmarks for Object Detection in Aerial Images
Pytorch Benchmarks for Object Detection in Aerial Images
ASPDNet -> Counting dense objects in remote sensing images
xview-yolov3 -> xView 2018 Object Detection Challenge: YOLOv3 Training and Inference
Faster RCNN for xView satellite data challenge
How to detect small objects in (very) large images -> A practical guide to using Slicing-Aided Hyper Inference (SAHI) for performing inference on the DOTAv1.0 object detection dataset using the mmdetection framework
Object Detection Satellite Imagery Multi-vehicles Dataset (SIMD) -> RetinaNet,Yolov3 and Faster RCNN for multi object detection on satellite images dataset
SNIPER/AutoFocus -> an efficient multi-scale object detection training/inference algorithm
marine_debris_ML -> Marine debris detection, uses 3-meter imagery product called Planetscope with bands in the red, green, blue, and near-infrared. Uses Tensorflow Object Detection API with pre-trained resnet 101
pool-detection-from-aerial-imagery -> Use Icevision and Detectron2 to detect swimming pools from aerial imagery
Electric-Pylon-Detection-in-RSI -> a dataset which contains 1500 remote sensing images of electric pylons used to train ten deep learning models
IS-Count -> IS-Count is a sampling-based and learnable method for estimating the total object count in a region
Object Detection On Aerial Imagery Using RetinaNet
Clustered-Object-Detection-in-Aerial-Image
yolov5s_for_satellite_imagery -> yolov5s applied to the DOTA dataset
RetinaNet-PyTorch -> RetinaNet implementation on remote sensing ship dataset (SSDD)
Detecting-Cyclone-Centers-Custom-YOLOv3 -> tropical cyclones (TCs) are intense warm-corded cyclonic vortices, developed from low-pressure systems over the tropical oceans and driven by complex air-sea interaction
Object-Detection-YoloV3-RetinaNet-FasterRCNN -> trained on a private datset
Google-earth-Object-Recognition -> Code for training and evaluating on Dior Dataset (Google Earth Images) using RetinaNet and YOLOV5
HIECTOR: Hierarchical object detector at scale -> HIECTOR facilitates multiple satellite data collections of increasingly detailed spatial resolution for a cost-efficient and accurate object detection over large areas. Code
Detection of Multiclass Objects in Optical Remote Sensing Images -> Detection of Multiclass Objects in Optical Remote Sensing Images
SB-MSN -> Improving Training Instance Quality in Aerial Image Object Detection With a Sampling-Balance-Based Multistage Network
yoltv5 -> detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Uses YOLOv5 & pytorch
AIR -> A deep learning object detector framework written in Python for supporting Land Search and Rescue Missions
dior_detect -> benchmarks for object detection on DIOR dataset
Panchromatic to Multispectral: Object Detection Performance as a Function of Imaging Bands -> Medium article, concludes that more bands are not always beneficial, but likely varies by use case
OPLD-Pytorch -> Learning Point-Guided Localization for Detection in Remote Sensing Images
F3Net -> Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images
GLNet -> Global to Local: Clip-LSTM-Based Object Detection From Remote Sensing Images
SRAF-Net -> A Scene-Relevant Anchor-Free Object Detection Network in Remote Sensing Images
object_detection_in_remote_sensing_images -> using CNN and attention mechanism
SHAPObjectDetection -> SHAP-Based Interpretable Object Detection Method for Satellite Imagery
NWD -> A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. Uses AI-TOD dataset
MSFC-Net -> Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery
LO-Det -> LO-Det: Lightweight Oriented Object Detection in Remote Sensing Images
R2IPoints -> Pursuing Rotation-Insensitive Point Representation for Aerial Object Detection
Object-Detection -> Multi-Scale Object Detection with the Pixel Attention Mechanism in a Complex Background
mmdet-rfla -> RFLA: Gaussian Receptive based Label Assignment for Tiny Object Detection
Interactive-Multi-Class-Tiny-Object-Detection -> Interactive Multi-Class Tiny-Object Detection
small-object-detection-benchmark -> Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection (SAHI)
OD-Satellite-iSAID -> Object Detection in Aerial Images: A Case Study on Performance Improvement using iSAID
Large-Selective-Kernel-Network -> Large Selective Kernel Network for Remote Sensing Object Detection
Satellite_Imagery_Detection_YOLOV7 -> YOLOV7 applied to xView1 Dataset
FSANet -> FSANet: Feature-and-Spatial-Aligned Network for Tiny Object Detection in Remote Sensing Images
OAN Fewer is More: Efficient Object Detection in Large Aerial Images, based on MMdetection
DOTA-C -> evaluating the robustness of object detection models to 19 types of image quality degradation
Satellite-Remote-Sensing-Image-Object-Detection -> using RefineDet & DOTA dataset
Object counting
When the object count, but not its shape is required, U-net can be used to treat this as an image-to-image translation problem.
centroid-unet -> Centroid-UNet is deep neural network model to detect centroids from satellite images
DO-U-Net -> an effective approach for when the size of an object needs to be known, as well as the number of objects in the image, initially created to segment and count Internally Displaced People (IDP) camps in Afghanistan
Cassava Crop Counting
Counting from Sky -> A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method
PSGCNet -> PSGCNet: A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote Sensing Images
psgcnet -> A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote-Sensing Images
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/regression.png)
Regression in remote sensing involves predicting continuous variables such as wind speed, tree height, or soil moisture from an image. Both classical machine learning and deep learning approaches can be used to accomplish this task. Classical machine learning utilizes feature engineering to extract numerical values from the input data, which are then used as input for a regression algorithm like linear regression. On the other hand, deep learning typically employs a convolutional neural network (CNN) to process the image data, followed by a fully connected neural network (FCNN) for regression. The FCNN is trained to map the input image to the desired output, providing predictions for the continuous variables of interest. Image source
python-windspeed -> Predicting windspeed of hurricanes from satellite images, uses CNN regression in keras
hurricane-wind-speed-cnn -> Predicting windspeed of hurricanes from satellite images, uses CNN regression in keras
GEDI-BDL -> Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles
Global-Canopy-Height-Map -> Estimating Canopy Height at Scale (ICML2024)
HighResCanopyHeight -> code for Meta paper: Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on Aerial Lidar
Traffic density estimation as a regression problem instead of object detection -> inspired by paper: Traffic density estimation method from small satellite imagery: Towards frequent remote sensing of car traffic
OpticalWaveGauging_DNN -> Optical wave gauging using deep neural networks
satellite-pose-estimation -> adapts a ResNet50 model architecture to perform pose estimation on several series of satellite images (both real and synthetic)
Tropical Cyclone Wind Estimation Competition -> on RadiantEarth MLHub
DengueNet -> DengueNet: Dengue Prediction using Spatiotemporal Satellite Imagery for Resource-Limited Countries
tropical_cyclone_uq -> Uncertainty Aware Tropical Cyclone Wind Speed Estimation from Satellite Data
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/clouds.png)
Clouds are a major issue in remote sensing images as they can obscure the underlying ground features. This hinders the accuracy and effectiveness of remote sensing analysis, as the obscured regions cannot be properly interpreted. In order to address this challenge, various techniques have been developed to detect clouds in remote sensing images. Both classical algorithms and deep learning approaches can be employed for cloud detection. Classical algorithms typically use threshold-based techniques and hand-crafted features to identify cloud pixels. However, these techniques can be limited in their accuracy and are sensitive to changes in image appearance and cloud structure. On the other hand, deep learning approaches leverage the power of convolutional neural networks (CNNs) to accurately detect clouds in remote sensing images. These models are trained on large datasets of remote sensing images, allowing them to learn and generalize the unique features and patterns of clouds. The generated cloud mask can be used to identify the cloud pixels and eliminate them from further analysis or, alternatively, cloud inpainting techniques can be used to fill in the gaps left by the clouds. This approach helps to improve the accuracy of remote sensing analysis and provides a clearer view of the ground, even in the presence of clouds. Image adapted from the paper 'Refined UNet Lite: End-to-End Lightweight Network for Edge-precise Cloud Detection'
CloudSEN12 -> Sentinel 2 cloud dataset with a varierty of models here
From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here ), but state that this requires too much compute resources.
This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.
Segmentation of Clouds in Satellite Images Using Deep Learning -> semantic segmentation using a Unet on the Kaggle 38-Cloud dataset
Cloud Detection in Satellite Imagery compares FPN+ResNet18 and CheapLab architectures on Sentinel-2 L1C and L2A imagery
Benchmarking Deep Learning models for Cloud Detection in Landsat-8 and Sentinel-2 images
Landsat-8 to Proba-V Transfer Learning and Domain Adaptation for Cloud detection
Multitemporal Cloud Masking in Google Earth Engine
s2cloudmask -> Sentinel-2 Cloud and Shadow Detection using Machine Learning
sentinel2-cloud-detector -> Sentinel Hub Cloud Detector for Sentinel-2 images in Python
dsen2-cr -> cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion, contains the model code, written in Python/Keras, as well as links to pre-trained checkpoints and the SEN12MS-CR dataset
pyatsa -> Python package implementing the Automated Time-Series Analysis method for masking clouds in satellite imagery developed by Zhu and Helmer 2018
decloud -> Decloud enables the training of various deep nets to remove clouds in optical image, using e.g. Sentinel 1 & 2
cloudless -> Deep learning pipeline for orbital satellite data for detecting clouds
Deep-Gapfill -> Official implementation of Optical image gap filling using deep convolutional autoencoder from optical and radar images
satellite-cloud-removal-dip -> Satellite cloud removal with Deep Image Prior, with paper
cloudFCN -> Python 3 package for Fully Convolutional Network development, specifically for cloud masking
Fmask -> Fmask (Function of mask) is used for automated clouds, cloud shadows, snow, and water masking for Landsats 4-9 and Sentinel 2 images, in Matlab. Also see PyFmask
HOW TO USE DEEP LEARNING, PYTORCH LIGHTNING, AND THE PLANETARY COMPUTER TO PREDICT CLOUD COVER IN SATELLITE IMAGERY
cloud-cover-winners -> winning submissions for the On Cloud N: Cloud Cover Detection Challenge
On-Cloud-N: Cloud Cover Detection Challenge - 19th Place Solution
ukis-csmask -> package to masks clouds in Sentinel-2, Landsat-8, Landsat-7 and Landsat-5 images
OpenSICDR -> long list of satellite image cloud detection resources
RS-Net -> A cloud detection algorithm for satellite imagery based on deep learning
Clouds-Segmentation-Project -> treats as a 3 class problem; Open clouds, Closed clouds and no clouds, uses pytorch on a dataset that consists of IR & Visual Grayscale images
STGAN -> STGAN for Cloud Removal in Satellite Images
mcgan-cvprw2017-pytorch -> Filmy Cloud Removal on Satellite Imagery with Multispectral Conditional Generative Adversarial Nets
Cloud-Net: A semantic segmentation CNN for cloud detection -> an end-to-end cloud detection algorithm for Landsat 8 imagery, trained on 38-Cloud Training Set
fcd -> Fixed-Point GAN for Cloud Detection. A weakly-supervised approach, training with only image-level labels
CloudX-Net -> an efficient and robust architecture used for detection of clouds from satellite images
A simple cloud-detection walk-through using Convolutional Neural Network (CNN and U-Net) and fast.ai library
38Cloud-Medium -> Walk-through using u-net to detect clouds in satellite images with fast.ai
cloud_detection_using_satellite_data -> performed on Sentinel 2 data
Luojia1-Cloud-Detection -> Luojia-1 Satellite Visible Band Nighttime Imagery Cloud Detection
SEN12MS-CR-TS -> A Remote Sensing Data Set for Multi-modal Multi-temporal Cloud Removal
ES-CCGAN -> This is a dehazed method for remote sensing image, which based on CycleGAN
Cloud_Classification_DL -> Classifying cloud organization patterns from satellite images using Deep Learning techniques (Mask R-CNN)
CNN-based-Cloud-Detection-Methods -> Understanding the Role of Receptive Field of Convolutional Neural Network for Cloud Detection in Landsat 8 OLI Imagery
cloud-removal-deploy -> flask app for cloud removal
CloudMattingGAN -> Generative Adversarial Training for Weakly Supervised Cloud Matting
km_predict -> KappaMask, or km-predict, is a cloud detector for Sentinel-2 Level-1C and Level-2A input products applied to S2 full image prediction
CDnet -> CNN-Based Cloud Detection for Remote Sensing Imager
CDnetV2 -> CNN-Based Cloud Detection for Remote Sensing Imagery With Cloud-Snow Coexistence
grouped-features-alignment -> Unsupervised Domain Adaptation for Cloud Detection Based on Grouped Features Alignment and Entropy Minimization
Detecting Cloud Cover Via Sentinel-2 Satellite Data -> blog post on Benjamin Warners Top-10 Percent Solution to DrivenData’s On CloudN Competition using fast.ai & customized version of XResNeXt50. Repo
AISD -> Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset
CloudGAN -> Detecting and Removing Clouds from RGB-images using Image Inpainting
Using GANs to Augment Data for Cloud Image Segmentation Task
Cloud-Segmentation-from-Satellite-Imagery -> applied to Sentinel-2 dataset
HRC_WHU -> High-Resolution Cloud Detection Dataset comprising 150 RGB images and a resolution varying from 0.5 to 15 m in different global regions
MEcGANs -> Cloud Removal from Satellite Imagery using Multispectral Edge-filtered Conditional Generative Adversarial Networks
CloudXNet -> CloudX-net: A robust encoder-decoder architecture for cloud detection from satellite remote sensing images
cloud-buster -> Sentinel-2 L1C and L2A Imagery with Fewer Clouds
SatelliteCloudGenerator -> A PyTorch-based tool to generate clouds for satellite images
SEnSeI -> A python 3 package for developing sensor independent deep learning models for cloud masking in satellite imagery
cloud-detection-venus -> Using Convolutional Neural Networks for Cloud Detection on VENμS Images over Multiple Land-Cover Types
explaining_cloud_effects -> Explaining the Effects of Clouds on Remote Sensing Scene Classification
Clouds-Images-Segmentation -> Marine Stratocumulus Cloud-Type Classification from SEVIRI Using Convolutional Neural Networks
DeCloud-GAN -> DeCloud GAN: An Advanced Generative Adversarial Network for Removing Cloud Cover in Optical Remote Sensing Imagery
cloud_segmentation_comparative -> BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery
PLFM-Clouds-Removal -> Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep Hierarchical Model
Cloud-removal-model-collection -> A collection of the existing end-to-end cloud removal models
SEnSeIv2 -> Sensor Independent Cloud and Shadow Masking with Ambiguous Labels and Multimodal Inputs
UnCRtainTS -> Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series
U-TILISE -> A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/change.png)
Change detection is a vital component of remote sensing analysis, enabling the monitoring of landscape changes over time. This technique can be applied to identify a wide range of changes, including land use changes, urban development, coastal erosion, and deforestation. Change detection can be performed on a pair of images taken at different times, or by analyzing multiple images collected over a period of time. It is important to note that while change detection is primarily used to detect changes in the landscape, it can also be influenced by the presence of clouds and shadows. These dynamic elements can alter the appearance of the image, leading to false positives in change detection results. Therefore, it is essential to consider the impact of clouds and shadows on change detection analysis, and to employ appropriate methods to mitigate their influence. Image source
awesome-remote-sensing-change-detection lists many datasets and publications
Change-Detection-Review -> A review of change detection methods, including code and open data sets for deep learning
Change Detection using Siamese Networks
STANet ->STANet for remote sensing image change detection
UNet-based-Unsupervised-Change-Detection -> A convolutional neural network (CNN) and semantic segmentation is implemented to detect the changes between the images, as well as classify the changes into the correct semantic class
BIT_CD -> Official Pytorch Implementation of Remote Sensing Image Change Detection with Transformers
Unstructured-change-detection-using-CNN
Siamese neural network to detect changes in aerial images -> uses Keras and VGG16 architecture
Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery
QGIS plugin for applying change detection algorithms on high resolution satellite imagery
LamboiseNet -> Master thesis about change detection in satellite imagery using Deep Learning
Fully Convolutional Siamese Networks for Change Detection
Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks -> used the Onera Satellite Change Detection (OSCD) dataset
IAug_CDNet -> Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images
dpm-rnn-public -> Code implementing a damage mapping method combining satellite data with deep learning
SenseEarth2020-ChangeDetection -> 1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime; predictions of five HRNet-based segmentation models are ensembled, serving as pseudo labels of unchanged areas
KPCAMNet -> Python implementation of the paper Unsupervised Change Detection in Multi-temporal VHR Images Based on Deep Kernel PCA Convolutional Mapping Network
CDLab -> benchmarking deep learning-based change detection methods.
Siam-NestedUNet -> SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images
SUNet-change_detection -> Implementation of paper SUNet: Change Detection for Heterogeneous Remote Sensing Images from Satellite and UAV Using a Dual-Channel Fully Convolution Network
Self-supervised Change Detection in Multi-view Remote Sensing Images
MFPNet -> Remote Sensing Change Detection Based on Multidirectional Adaptive Feature Fusion and Perceptual Similarity
GitHub for the DIUx xView Detection Challenge -> The xView2 Challenge focuses on automating the process of assessing building damage after a natural disaster
DASNet -> Dual attentive fully convolutional siamese networks for change detection of high-resolution satellite images
Self-Attention for Raw Optical Satellite Time Series Classification
planet-movement -> Find and process Planet image pairs to highlight object movement
temporal-cluster-matching -> detecting change in structure footprints from time series of remotely sensed imagery
autoRIFT -> fast and intelligent algorithm for finding the pixel displacement between two images
DSAMNet -> A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection
SRCDNet -> Super-resolution-based Change Detection Network with Stacked Attention Module for Images with Different Resolutions. SRCDNet is designed to learn and predict change maps from bi-temporal images with different resolutions
Land-Cover-Analysis -> Land Cover Change Detection using Satellite Image Segmentation
A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images
Satellite-Image-Alignment-Differencing-and-Segmentation
Change Detection in Multi-temporal Satellite Images -> uses Principal Component Analysis (PCA) and K-means clustering
Unsupervised Change Detection Algorithm using PCA and K-Means Clustering -> in Matlab but has paper
ChangeFormer -> A Transformer-Based Siamese Network for Change Detection. Uses transformer architecture to address the limitations of CNN in handling multi-scale long-range details. Demonstrates that ChangeFormer captures much finer details compared to the other SOTA methods, achieving better performance on benchmark datasets
Heterogeneous_CD -> Heterogeneous Change Detection in Remote Sensing Images
ChangeDetectionProject -> Trying out Active Learning in with deep CNNs for Change detection on remote sensing data
DSFANet -> Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images
siamese-change-detection -> Targeted synthesis of multi-temporal remote sensing images for change detection using siamese neural networks
Bi-SRNet -> Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images
SiROC -> Spatial Context Awareness for Unsupervised Change Detection in Optical Satellite Images. Applied to Sentinel-2 and high-resolution Planetscope imagery on four datasets
DSMSCN -> Tensorflow implementation for Change Detection in Multi-temporal VHR Images Based on Deep Siamese Multi-scale Convolutional Neural Networks
RaVAEn -> a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment. It flags changed areas to prioritise for downlink, shortening the response time
SemiCD -> Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. Achieves the performance of supervised CD even with access to as little as 10% of the annotated training data
FCCDN_pytorch -> FCCDN: Feature Constraint Network for VHR Image Change Detection. Uses the LEVIR-CD building change detection dataset
INLPG_Python -> Structure Consistency based Graph for Unsupervised Change Detection with Homogeneous and Heterogeneous Remote Sensing Images
NSPG_Python -> Nonlocal patch similarity based heterogeneous remote sensing change detection
DS_UNet -> Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection using a Dual Stream U-Net, uses Onera Satellite Change Detection dataset
SiameseSSL -> Urban change detection with a Dual-Task Siamese network and semi-supervised learning. Uses SpaceNet 7 dataset
CD-SOTA-methods -> Remote sensing change detection: State-of-the-art methods and available datasets
multimodalCD_ISPRS21 -> Fusing Multi-modal Data for Supervised Change Detection
Unsupervised-CD-in-SITS-using-DL-and-Graphs -> Unsupervised Change Detection Analysis in Satellite Image Time Series using Deep Learning Combined with Graph-Based Approaches
LSNet -> Extremely Light-Weight Siamese Network For Change Detection in Remote Sensing Image
Change-Detection-in-Remote-Sensing-Images -> using PCA & K-means
End-to-end-CD-for-VHR-satellite-image -> End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet++
Semantic-Change-Detection -> SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery
ERCNN-DRS_urban_change_monitoring -> Neural Network-Based Urban Change Monitoring with Deep-Temporal Multispectral and SAR Remote Sensing Data
EGRCNN -> Edge-guided Recurrent Convolutional Neural Network for Multi-temporal Remote Sensing Image Building Change Detection
Unsupervised-Remote-Sensing-Change-Detection -> An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning
CropLand-CD -> A CNN-transformer Network with Multi-scale Context Aggregation for Fine-grained Cropland Change Detection
contrastive-surface-image-pretraining -> Supervising Remote Sensing Change Detection Models with 3D Surface Semantics
dcvaVHROptical -> Unsupervised Deep Change Vector Analysis for Multiple-Change Detection in VHR Images
hyperdimensionalCD -> Change Detection in Hyperdimensional Images Using Untrained Models
FCD-GAN-pytorch -> Fully Convolutional Change Detection Framework with Generative Adversarial Network (FCD-GAN) is a framework for change detection in multi-temporal remote sensing images
DARNet-CD -> A Densely Attentive Refinement Network for Change Detection Based on Very-High-Resolution Bitemporal Remote Sensing Images
xView2_Vulcan -> Damage assessment using pre and post orthoimagery. Modified + productionized model based off the first-place model from the xView2 challenge.
ESCNet -> An End-to-End Superpixel-Enhanced Change Detection Network for Very-High-Resolution Remote Sensing Images
ForestCoverChange -> Detecting and Predicting Forest Cover Change in Pakistani Areas Using Remote Sensing Imagery
deforestation-detection -> DEEP LEARNING FOR HIGH-FREQUENCY CHANGE DETECTION IN UKRAINIAN FOREST ECOSYSTEM WITH SENTINEL-2
SentinelClearcutDetection -> Scripts for deforestation detection on the Sentinel-2 Level-A images
clearcut_detection -> research & web-service for clearcut detection
CDRL -> Unsupervised Change Detection Based on Image Reconstruction Loss
ddpm-cd -> Remote Sensing Change Detection (Segmentation) using Denoising Diffusion Probabilistic Models
Remote-sensing-time-series-change-detection -> Graph-based block-level urban change detection using Sentinel-2 time series
austin-ml-change-detection-demo -> A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery
dfc2021-msd-baseline -> Multitemporal Semantic Change Detection track of the 2021 IEEE GRSS Data Fusion Competition
CorrFusionNet -> Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion
ChangeDetectionPCAKmeans -> Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k-Means Clustering.
IRCNN -> IRCNN: An Irregular-Time-Distanced Recurrent Convolutional Neural Network for Change Detection in Satellite Time Series
UTRNet -> An Unsupervised Time-Distance-Guided Convolutional Recurrent Network for Change Detection in Irregularly Collected Images
open-cd -> an open source change detection toolbox based on a series of open source general vision task tools
Tiny_model_4_CD -> TINYCD: A (Not So) Deep Learning Model For Change Detection. Uses LEVIR-CD & WHU-CD datasets
FHD -> Feature Hierarchical Differentiation for Remote Sensing Image Change Detection
Change detection with Raster Vision -> blog post with Colab notebook
building-expansion -> Enhancing Environmental Enforcement with Near Real-Time Monitoring: Likelihood-Based Detection of Structural Expansion of Intensive Livestock Farms
SaDL_CD -> Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection
EGCTNet_pytorch -> Building Change Detection Based on an Edge-Guided Convolutional Neural Network Combined with a Transformer
S2-cGAN -> S2-cGAN: Self-Supervised Adversarial Representation Learning for Binary Change Detection in Multispectral Images
A-loss-function-for-change-detection -> UAL: Unchanged Area Loss-Function for Change Detection Networks
IEEE_TGRS_SSTFormer -> Spectral–Spatial–Temporal Transformers for Hyperspectral Image Change Detection
DMINet -> Change Detection on Remote Sensing Images Using Dual-Branch Multilevel Intertemporal Network
AFCF3D-Net -> Adjacent-level Feature Cross-Fusion with 3D CNN for Remote Sensing Image Change Detection
DSAHRNet -> A Deeply Attentive High-Resolution Network for Change Detection in Remote Sensing Images
RDPNet -> RDP-Net: Region Detail Preserving Network for Change Detection
BGAAE_CD -> Bipartite Graph Attention Autoencoders for Unsupervised Change Detection Using VHR Remote Sensing Images
Unsupervised-Change-Detection -> Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k-Means Clustering
Metric-CD -> Deep Metric Learning for Unsupervised Change Detection in Remote Sensing Images
HANet-CD -> HANet: A hierarchical attention network for change detection with bi-temporal very-high-resolution remote sensing images
SRGCAE -> Unsupervised Multimodal Change Detection Based on Structural Relationship Graph Representation Learning
change_detection_onera_baselines -> Siamese version of U-Net baseline model
SiamCRNN -> Change Detection in Multisource VHR Images via Deep Siamese Convolutional Multiple-Layers Recurrent Neural Network
Graph-based methods for change detection in remote sensing images -> Graph Learning Based on Signal Smoothness Representation for Homogeneous and Heterogeneous Change Detection
AR-CDNet -> Towards Accurate and Reliable Change Detection of Remote Sensing Images via Knowledge Review and Online Uncertainty Estimation
CICNet -> Compact Intertemporal Coupling Network for Remote Sensing Change Detection
BGINet -> Remote Sensing Image Change Detection with Graph Interaction
DSNUNet -> DSNUNet: An Improved Forest Change Detection Network by Combining Sentinel-1 and Sentinel-2 Images
Forest-CD -> Forest-CD: Forest Change Detection Network Based on VHR Images
S3Net_CD -> Superpixel-Guided Self-Supervised Learning Network for Change Detection in Multitemporal Image Change Detection
T-UNet -> T-UNet: Triplet UNet for Change Detection in High-Resolution Remote Sensing Images
UCDFormer -> UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
satellite-change-events -> Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery, uses Sentinel 2 CaiRoad & CalFire datasets
CACo -> Change-Aware Sampling and Contrastive Learning for Satellite Images
LightCDNet -> LightCDNet: Lightweight Change Detection Network Based on VHR Images
OpenMineChangeDetection -> Characterising Open Cast Mining from Satellite Data (Sentinel 2), implements TinyCD, LSNet & DDPM-CD
multi-task-L-UNet -> A Deep Multi-Task Learning Framework Coupling Semantic Segmentation and Fully Convolutional LSTM Networks for Urban Change Detection. Applied to SpaceNet7 dataset
urban_change_detection -> Detecting Urban Changes With Recurrent Neural Networks From Multitemporal Sentinel-2 Data. fabric is another implementation
UNetLSTM -> Detecting Urban Changes With Recurrent Neural Networks From Multitemporal Sentinel-2 Data
SDACD -> An End-to-end Supervised Domain Adaptation Framework for Cross-domain Change Detection
CycleGAN-Based-DA-for-CD -> CycleGAN-based Domain Adaptation for Deforestation Detection
CGNet-CD -> Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery
PA-Former -> PA-Former: Learning Prior-Aware Transformer for Remote Sensing Building Change Detection
AERNet -> AERNet: An Attention-Guided Edge Refinement Network and a Dataset for Remote Sensing Building Change Detection (HRCUS-CD)
S1GFlood-Detection -> DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers. Includes S1GFloods dataset
Changen -> Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process
TTP -> Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection
SAM-CD -> Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images
SCanNet -> Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images
ELGC-Net -> Efficient Local-Global Context Aggregation for Remote Sensing Change Detection
Official_Remote_Sensing_Mamba -> RS-Mamba for Large Remote Sensing Image Dense Prediction
ChangeMamba -> Remote Sensing Change Detection with Spatio-Temporal State Space Model
ClearSCD -> Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery
RSCaMa -> Remote Sensing Image Change Captioning with State Space Model
ChangeBind -> A Hybrid Change Encoder for Remote Sensing Change Detection
OctaveNet -> An efficient multi-scale pseudo-siamese network for change detection in remote sensing images
MaskCD -> A Remote Sensing Change Detection Network Based on Mask Classification
I3PE -> Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange
BDANet -> Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images
BAN -> A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection
ubdd -> Learning Efficient Unsupervised Satellite Image-based Building Damage Detection, uses xView2
SGSLN -> Exchanging Dual-Encoder–Decoder: A New Strategy for Change Detection With Semantic Guidance and Spatial Localization
ChangeViT -> Unleashing Plain Vision Transformers for Change Detection
pytorch-change-models -> out-of-box contemporary spatiotemporal change model implementations, standard metrics, and datasets
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/time-series.png)
The analysis of time series observations in remote sensing data has numerous applications, including enhancing the accuracy of classification models and forecasting future patterns and events. Image source . Note: since classifying crops and predicting crop yield are such prominent use case for time series data, these tasks have dedicated sections after this one.
LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest
temporalCNN -> Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series
pytorch-psetae -> Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention
satflow -> optical flow models for predicting future satellite images from current and past ones
esa-superresolution-forecasting -> Forecasting air pollution using ESA Sentinel-5p data, and an encoder-decoder convolutional LSTM neural network architecture
lightweight-temporal-attention-pytorch -> Light Temporal Attention Encoder (L-TAE) for satellite image time series
dtwSat -> Time-Weighted Dynamic Time Warping for satellite image time series analysis
MTLCC -> Multitemporal Land Cover Classification Network. A recurrent neural network approach to encode multi-temporal data for land cover classification
PWWB -> Real-Time Spatiotemporal Air Pollution Prediction with Deep Convolutional LSTM through Satellite Image Analysis
spaceweather -> predicting geomagnetic storms from satellite measurements of the solar wind and solar corona, uses LSTMs
Forest_wildfire_spreading_convLSTM -> Modeling of the spreading of forest wildfire using a neural network with ConvLSTM cells. Prediction 3-days forward
ConvTimeLSTM -> Extension of ConvLSTM and Time-LSTM for irregularly spaced images, appropriate for Remote Sensing
dl-time-series -> Deep Learning algorithms applied to characterization of Remote Sensing time-series
tpe -> Generalized Classification of Satellite Image Time Series With Thermal Positional Encoding
wildfire_forecasting -> Deep Learning Methods for Daily Wildfire Danger Forecasting. Uses ConvLSTM
satellite_image_forecasting -> predict future satellite images from past ones using features such as precipitation and elevation maps. Entry for the EarthNet2021 challenge
Deep Learning for Cloud Gap-Filling on Normalized Difference Vegetation Index using Sentinel Time-Series -> A CNN-RNN based model that identifies correlations between optical and SAR data and exports dense Normalized Difference Vegetation Index (NDVI) time-series of a static 6-day time resolution and can be used for Events Detection tasks
DeepSatModels -> ViTs for SITS: Vision Transformers for Satellite Image Time Series
Presto -> Lightweight, Pre-trained Transformers for Remote Sensing Timeseries
LULC mapping using time series data & spectral bands -> uses 1D convolutions that learn from time-series data. Accompanies blog post: Time-Traveling Pixels: A Journey into Land Use Modeling
hurricane-net -> A deep learning framework for forecasting Atlantic hurricane trajectory and intensity.
CAPES -> Construction changes are detected using the U-net model and satellite time series
Exchanger4SITS -> Rethinking the Encoding of Satellite Image Time Series
Rapid Wildfire Hotspot Detection Using Self-Supervised Learning on Temporal Remote Sensing Data
stenn-pytorch -> A Spatio-temporal Encoding Neural Network for Semantic Segmentation of Satellite Image Time Series
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/crops.jpg)
Crop classification in remote sensing is the identification and mapping of different crops in images or sequences of images. It aims to provide insight into the distribution and composition of crops in a specific area, with applications that include monitoring crop growth and evaluating crop damage. Both traditional machine learning methods, such as decision trees and support vector machines, and deep learning techniques, such as convolutional neural networks (CNNs), can be used to perform crop classification. The optimal method depends on the size and complexity of the dataset, the desired accuracy, and the available computational resources. However, the success of crop classification relies heavily on the quality and resolution of the input data, as well as the availability of labeled training data. Image source: High resolution satellite imaging sensors for precision agriculture by Chenghai Yang
Classification of Crop Fields through Satellite Image Time Series -> using a pytorch-psetae & Sentinel-2 data
CropDetectionDL -> using GRU-net, First place solution for Crop Detection from Satellite Imagery competition organized by CV4A workshop at ICLR 2020
Radiant-Earth-Spot-the-Crop-Challenge -> The main objective of this challenge was to use time-series of Sentinel-2 multi-spectral data to classify crops in the Western Cape of South Africa. The challenge was to build a machine learning model to predict crop type classes for the test dataset
CropMappingInterpretation -> An interpretation pipeline towards understanding multi-temporal deep learning approaches for crop mapping
timematch -> A method to perform unsupervised cross-region adaptation of crop classifiers trained with satellite image time series. We also introduce an open-access dataset for cross-region adaptation with SITS from four different regions in Europe
elects -> End-to-End Learned Early Classification of Time Series for In-Season Crop Type Mapping
3d-fpn-and-time-domain -> Sentinel 2 Time Series Analysis with 3D Feature Pyramid Network and Time Domain Class Activation Intervals for Crop Mapping
in-season-and-dynamic-crop-mapping -> In-season and dynamic crop mapping using 3D convolution neural networks and sentinel-2 time series, uses the Lombardy crop dataset
MultiviewCropClassification -> A COMPARATIVE ASSESSMENT OF MULTI-VIEW FUSION LEARNING FOR CROP CLASSIFICATION
Detection of manure application on crop fields leveraging satellite data and Machine Learning
StressNet: A spatial-spectral-temporal deformable attention-based framework for water stress classification in maize -> Water Stress Classification on Multispectral data of Maize captured by UAV
XAI4EO -> Towards Explainable AI4EO: an explainable DL approach for crop type mapping using SITS
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/yield.png)
Crop yield is a crucial metric in agriculture, as it determines the productivity and profitability of a farm. It is defined as the amount of crops produced per unit area of land and is influenced by a range of factors including soil fertility, weather conditions, the type of crop grown, and pest and disease control. By utilizing time series of satellite images, it is possible to perform accurate crop type classification and take advantage of the seasonal variations specific to certain crops. This information can be used to optimize crop management practices and ultimately improve crop yield. However, to achieve accurate results, it is essential to consider the quality and resolution of the input data, as well as the availability of labeled training data. Appropriate pre-processing and feature extraction techniques must also be employed. Image source .
Crop yield Prediction with Deep Learning -> Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data
Deep-Transfer-Learning-Crop-Yield-Prediction
Crop-Yield-Prediction-using-ML -> A simple Web application developed in order to provide the farmers/users an approximation on how much amount of crop yield will be produced depending upon the given input
Building a Crop Yield Prediction App in Senegal Using Satellite Imagery and Jupyter Voila
Crop Yield Prediction Using Deep Neural Networks and LSTM
Deep transfer learning techniques for crop yield prediction, published in COMPASS 2018
Understanding crop yield predictions from CNNs
Advanced Deep Learning Techniques for Predicting Maize Crop Yield using Sentinel-2 Satellite Imagery
pycrop-yield-prediction -> Deep Gaussian Process for Crop Yield Prediction
PredictYield -> using data scraped from Google Earth Engine, this predicts the yield of Corn, Soybean, and Wheat in the USA with Keras
Crop-Yield-Prediction-and-Estimation-using-Time-series-remote-sensing-data
Yield-Prediction-Using-Sentinel-Data
SPACY -> Satellite Prediction of Aggregate Corn Yield
cropyieldArticle -> Scalable Crop Yield Prediction with Sentinel-2 Time Series and Temporal Convolutional Network
CNN-RNN-Yield-Prediction ->A CNN-RNN Framework for Crop Yield Prediction
Yield-Prediction-DNN -> Crop Yield Prediction Using Deep Neural Networks
MMST-ViT -> MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer. This paper utilizes the Tiny CropNet dataset
Greenearthnet -> Multi-modal learning for geospatial vegetation forecasting
crop-forecasting -> Predicting rice field yields
SICKLE -> A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Cropping Parameters. Basline solutions: U-TAE, U-Net3D and ConvLSTM
Wealth and economic activity
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/economic.png)
The traditional approach of collecting economic data through ground surveys is a time-consuming and resource-intensive process. However, advancements in satellite technology and machine learning offer an alternative solution. By utilizing satellite imagery and applying machine learning algorithms, it is possible to obtain accurate and current information on economic activity with greater efficiency. This shift towards satellite imagery-based forecasting not only provides cost savings but also offers a wider and more comprehensive perspective of economic activity. As a result, it is poised to become a valuable asset for both policymakers and businesses. Image source .
Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages
satellite_led_liverpool -> Remote Sensing-Based Measurement of Living Environment Deprivation - Improving Classical Approaches with Machine Learning
Predicting_Energy_Consumption_With_Convolutional_Neural_Networks
SustainBench -> Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning
Measuring the Impacts of Poverty Alleviation Programs with Satellite Imagery and Deep Learning
Building a Spatial Model to Classify Global Urbanity Levels -> estimage global urbanity levels from population data, nightime lights and road networks
deeppop -> Deep Learning Approach for Population Estimation from Satellite Imagery, also on Github
Estimating telecoms demand in areas of poor data availability
satimage -> Code and models for the manuscript "Predicting Poverty and Developmental Statistics from Satellite Images using Multi-task Deep Learning". Predict the main material of a roof, source of lighting and source of drinking water for properties, from satellite imagery
africa_poverty -> Using publicly available satellite imagery and deep learning to understand economic well-being in Africa
Predicting-Poverty -> Combining satellite imagery and machine learning to predict poverty, in PyTorch
income-prediction -> Predicting average yearly income based on satellite imagery using CNNs, uses pytorch
urban_score -> Learning to score economic development from satellite imagery
READ -> Lightweight and robust representation of economic scales from satellite imagery
Slum-classification -> Binary classification on a very high-resolution satellite image in case of mapping informal settlements using unet
Predicting_Poverty -> uses daytime & luminosity of nighttime satellite images
Cancer-Prevalence-Satellite-Images -> Predict Health Outcomes from Features of Satellite Images
Mapping Poverty in Bangladesh with Satellite Images and Deep Learning -> combines health data with OpenStreetMaps Data & night and daytime satellite imagery
Population Estimation from Satellite Imagery
Deep_Learning_Satellite_Imd -> Using Deep Learning on Satellite Imagery to predict population and economic indicators
Disaster response
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/disaster.png)
Remote sensing images are used in disaster response to identify and assess damage to an area. This imagery can be used to detect buildings that are damaged or destroyed, identify roads and road networks that are blocked, determine the size and shape of a disaster area, and identify areas that are at risk of flooding. Remote sensing images can also be used to detect and monitor the spread of forest fires and monitor vegetation health. Also checkout the sections on change detection and water/fire/building segmentation. Image source .
DisaVu -> combines building & damage detection and provides an app for viewing predictions
Soteria -> uses machine learning with satellite imagery to map natural disaster impacts for faster emergency response
DisasterHack -> Wildfire Mitigation: Computer Vision Identification of Hazard Fuels Using Landsat
forestcasting -> Forest fire prediction powered by analytics
Machine Learning-based Damage Assessment for Disaster Relief on Google AI blog -> uses object detection to locate buildings, then a classifier to determine if a building is damaged. Challenge of generalising due to small dataset
hurricane_damage -> Post-hurricane structure damage assessment based on aerial imagery with CNN
rescue -> code of the paper: Attention to fires: multi-channel deep-learning models forwildfire severity prediction
-. Disaster-Classification -> A disaster classification model to predict the type of disaster given an input image
Coarse-to-fine weakly supervised learning method for green plastic cover segmentation
Detection of destruction in satellite imagery
BDD-Net -> A General Protocol for Mapping Buildings Damaged by a Wide Range of Disasters Based on Satellite Imagery
building-segmentation-disaster-resilience -> 2nd place solution in the Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience
Flooding Damage Detection from Post-Hurricane Satellite Imagery Based on Convolutional Neural Networks
IBM-Disaster-Response-Hack -> identifying optimal terrestrial routes through calamity-stricken areas. Satellite image data informs road condition assessment and obstruction detection
Hurricane-Damage-Detection -> Waterloo's Hack the North 2020++ submission. A convolutional neural network model used to detect hurricane damage in RGB satellite images
Satellite Image Analysis with fast.ai for Disaster Recovery
shackleton -> leverages remote sensing imagery and machine learning techniques to provide insights into various transportation and evacuation scenarios in an interactive dashboard that conducts real-time computation
ai-vegetation-fuel -> Predicting Fuel Load from earth observation data using Machine Learning, using LightGBM & CatBoost
AI Helps Detect Disaster Damage From Satellite Imagery -> NVIDIA blog post
Turkey-Earthquake-2023-Building-Change-Detection -> The repository contains building footprints derived from Maxar open data imagery and change detection results by blackshark-ai
MS4D-Net-Building-Damage-Assessment -> MS4D-Net: Multitask-Based Semi-Supervised Semantic Segmentation Framework with Perturbed Dual Mean Teachers for Building Damage Assessment from High-Resolution Remote Sensing Imagery
DAHiTra -> Large-scale Building Damage Assessment using a Novel Hierarchical Transformer Architecture on Satellite Images. Uses xView2 xBD dataset
skai -> a machine learning based tool from Goolge for performing automatic building damage assessments on aerial imagery of disaster sites.
building-damage-assessment-cnn-siamese -> from the Microsoft Ai for Good lab
Super-resolution
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/super-res.jpg)
Super-resolution is a technique aimed at improving the resolution of an imaging system. This process can be applied prior to other image processing steps to increase the visibility of small objects or boundaries. Despite its potential benefits, the use of super-resolution is controversial due to the possibility of introducing artifacts that could be mistaken for real features. Super-resolution techniques are broadly categorized into two groups: single image super-resolution (SISR) and multi-image super-resolution (MISR). SISR focuses on enhancing the resolution of a single image, while MISR utilizes multiple images of the same scene to create a high-resolution output. Each approach has its own advantages and limitations, and the choice of method depends on the specific application and desired outcome. Image source .
Multi image super-resolution (MISR)
Note that nearly all the MISR publications resulted from the PROBA-V Super Resolution competition
deepsum -> Deep neural network for Super-resolution of Unregistered Multitemporal images (ESA PROBA-V challenge)
3DWDSRNet -> Satellite Image Multi-Frame Super Resolution (MISR) Using 3D Wide-Activation Neural Networks
RAMS -> Multi-Image Super Resolution of Remotely Sensed Images Using Residual Attention Deep Neural Networks
TR-MISR -> Transformer-based MISR framework for the the PROBA-V super-resolution challenge. With paper
HighRes-net -> Pytorch implementation of HighRes-net, a neural network for multi-frame super-resolution, trained and tested on the European Space Agency’s Kelvin competition
ProbaVref -> Repurposing the Proba-V challenge for reference-aware super resolution
The missing ingredient in deep multi-temporal satellite image super-resolution -> Permutation invariance harnesses the power of ensembles in a single model, with repo piunet
MSTT-STVSR -> Space-time Super-resolution for Satellite Video: A Joint Framework Based on Multi-Scale Spatial-Temporal Transformer, JAG, 2022
Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites
DDRN -> Deep Distillation Recursive Network for Video Satellite Imagery Super-Resolution
- worldstrat -> SISR and MISR implementations of SRCNN
MISR-GRU -> Pytorch implementation of MISR-GRU, a deep neural network for multi image super-resolution (MISR), for ProbaV Super Resolution Competition
MSDTGP -> Satellite Video Super-Resolution via Multiscale Deformable Convolution Alignment and Temporal Grouping Projection
proba-v-super-resolution-challenge -> Solution to ESA's satellite imagery super resolution challenge
PROBA-V-Super-Resolution -> solution using a custom deep learning architecture
satlas-super-resolution -> Satlas Super Resolution: model is an adaptation of ESRGAN, with changes that allow the input to be a time series of Sentinel-2 images.
MISR Remote Sensing SRGAN -> PyTorch SRGAN for RGB Remote Sensing imagery, performing both SISR and MISR. MISR implementation inspired by RecursiveNet (HighResNet). Includes pretrained Checkpoints.
MISR-S2 -> Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series
Single image super-resolution (SISR)
Swin2-MoSE -> Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing
sentinel2_superresolution -> Super-resolution of 10 Sentinel-2 bands to 5-meter resolution, starting from L1C or L2A (Theia format) products. Trained on Sen2Venµs
Super Resolution for Satellite Imagery - srcnn repo
TensorFlow implementation of "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" adapted for working with geospatial data
Random Forest Super-Resolution (RFSR repo) including sample data
Enhancing Sentinel 2 images by combining Deep Image Prior and Decrappify . Repo for deep-image-prior and article on decrappify
Image Super-Resolution using an Efficient Sub-Pixel CNN -> the keras docs have a great tutorial on this light weight but well performing model
super-resolution-using-gan -> Super-Resolution of Sentinel-2 Using Generative Adversarial Networks
Super-resolution of Multispectral Satellite Images Using Convolutional Neural Networks
Multi-temporal Super-Resolution on Sentinel-2 Imagery using HighRes-Net, repo
SSPSR-Pytorch -> A spatial-spectral prior deep network for single hyperspectral image super-resolution
Sentinel-2 Super-Resolution: High Resolution For All (Bands)
CinCGAN -> Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks
Satellite-image-SRGAN using PyTorch
EEGAN -> Edge Enhanced GAN For Remote Sensing Image Super-Resolution, TensorFlow 1.1
PECNN -> A Progressively Enhanced Network for Video Satellite Imagery Super-Resolution, minimal documentation
hs-sr-tvtv -> Enhanced Hyperspectral Image Super-Resolution via RGB Fusion and TV-TV Minimization
sr4rs -> Super resolution for remote sensing, with pre-trained model for Sentinel-2, SRGAN-inspired
Restoring old aerial images with Deep Learning -> Medium article on Super Resolution with Perceptual Loss function and real images as input
RFSR_TGRS -> Hyperspectral Image Super-Resolution via Recurrent Feedback Embedding and Spatial-Spectral Consistency Regularization
SEN2VENµS -> a dataset for the training of Sentinel-2 super-resolution algorithms. With paper
TransENet -> Transformer-based Multi-Stage Enhancement for Remote Sensing Image Super-Resolution
SG-FBGAN -> Remote Sensing Image Super-Resolution via Saliency-Guided Feedback GANs
finetune_ESRGAN -> finetune the ESRGAN super resolution generator for remote sensing images and video
MIP -> Unsupervised Remote Sensing Super-Resolution via Migration Image Prior
Optical-RemoteSensing-Image-Resolution -> Deep Memory Connected Neural Network for Optical Remote Sensing Image Restoration. Two applications: Gaussian image denoising and single image super-resolution
HSENet -> Hybrid-Scale Self-Similarity Exploitation for Remote Sensing Image Super-Resolution
SR_RemoteSensing -> Super-Resolution deep learning models for remote sensing data based on BasicSR
RSI-Net -> A Deep Multi-task Convolutional Neural Network for Remote Sensing Image Super-resolution and Colorization
EDSR-Super-Resolution -> EDSR model using PyTorch applied to satellite imagery
CycleCNN -> Nonpairwise-Trained Cycle Convolutional Neural Network for Single Remote Sensing Image Super-Resolution
SISR with with Real-World Degradation Modeling -> Single-Image Super Resolution of Remote Sensing Images with Real-World Degradation Modeling
pixel-smasher -> Super-Resolution Surface Water Mapping on the Canadian Shield Using Planet CubeSat Images and a Generative Adversarial Network
satellite-image-super-resolution -> A Comparative Study on CNN-Based Single-Image Super-Resolution Techniques for Satellite Images
SatelliteSR -> comparison of a number of techniques on the DOTA dataset
Image-Super-Resolution -> Super resolution RESNET network
Unsupervised Super Resolution for Sentinel-2 satellite imagery -> using Deep Image Prior (DIP), Zero-Shot Super Resolution (ΖSSR) & Degradation-Aware Super Resolution (DASR)
Spectral Super-Resolution of Satellite Imagery with Generative Adversarial Networks
Super resolution using GAN / 4x Improvement -> applied to Sentinel 2
rs-esrgan -> RS-ESRGAN: Super-Resolution of Sentinel-2 Imagery Using Generative Adversarial Networks
TS-RSGAN -> Super-Resolution of Remote Sensing Images for ×4 Resolution without Reference Images. Applied to Sentinel-2
CDCR -> Combining Discrete and Continuous Representation: Scale-Arbitrary Super-Resolution for Satellite Images
FunSR -> cContinuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space
HAUNet_RSISR -> Hybrid Attention-Based U-Shaped Network for Remote Sensing Image Super-Resolution
L1BSR -> Exploiting Detector Overlap for Self-Supervised SISR of Sentinel-2 L1B Imagery
Deep-Harmonization -> Deep Learning-based Harmonization and Super-Resolution of Landsat-8 and Sentinel-2 images
SGDM -> Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior
Super-resolution - Miscellaneous
The value of super resolution — real world use case -> Medium article on parcel boundary detection with super-resolved satellite imagery
Super-Resolution on Satellite Imagery using Deep Learning -> Nov 2016 blog post by CosmiQ Works with a nice introduction to the topic. Proposes and demonstrates a new architecture with perturbation layers with practical guidance on the methodology and code . Three part series
Introduction to spatial resolution
Awesome-Super-Resolution -> another 'awesome' repo, getting a little out of date now
Super-Resolution (python) Utilities for managing large satellite images
pytorch-enhance -> Library of Image Super-Resolution Models, Datasets, and Metrics for Benchmarking or Pretrained Use. Also checkout this implementation in Jax
Super Resolution in OpenCV
AI-based Super resolution and change detection to enforce Sentinel-2 systematic usage -> Worldview-2 images (2m) were used to create a reference dataset and increase the spatial resolution of the Copernicus sensor from 10m to 5m
Model-Guided Deep Hyperspectral Image Super-resolution -> code accompanying the paper: Model-Guided Deep Hyperspectral Image Super-Resolution
Super-resolving beyond satellite hardware -> paper assessing SR performance in reconstructing realistically degraded satellite images
satellite-pixel-synthesis-pytorch -> PyTorch implementation of NeurIPS 2021 paper: Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis
SRE-HAN -> Squeeze-and-Residual-Excitation Holistic Attention Network improves super-resolution (SR) on remote-sensing imagery compared to other state-of-the-art attention-based SR models
satsr -> A project to perform super-resolution on multispectral images from any satellite, including Sentinel 2, Landsat 8, VIIRS &MODIS
OLI2MSI -> dataset for remote sensing imagery super-resolution composed of Landsat8-OLI and Sentinel2-MSI images
MMSR -> Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution
HSRnet -> Hyperspectral Image Super-resolution via Deep Spatio-spectral Attention Convolutional Neural Networks
RRSGAN -> RRSGAN: Reference-Based Super-Resolution for Remote Sensing Image
HDR-DSP-SR -> Self-supervised multi-image super-resolution for push-frame satellite images
GAN-HSI-SR -> Hyperspectral Image Super-Resolution by Band Attention Through Adversarial Learning
Restoring old aerial images with Deep Learning -> Medium article Super Resolution with Perceptual Loss function and real images as input
Pansharpening
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/pansharpen.png)
Pansharpening is a data fusion method that merges the high spatial detail from a high-resolution panchromatic image with the rich spectral information from a lower-resolution multispectral image. The result is a single, high-resolution color image that retains both the sharpness of the panchromatic band and the color information of the multispectral bands. This process enhances the spatial resolution while preserving the spectral qualities of the original images. Image source
Several algorithms described in the ArcGIS docs , with the simplest being taking the mean of the pan and RGB pixel value.
PGCU -> Probability-based Global Cross-modal Upsampling for Pansharpening
rio-pansharpen -> pansharpening Landsat scenes
Simple-Pansharpening-Algorithms
Working-For-Pansharpening -> long list of pansharpening methods and update of Awesome-Pansharpening
PSGAN -> A Generative Adversarial Network for Remote Sensing Image Pan-sharpening
Pansharpening-by-Convolutional-Neural-Network
PBR_filter -> Pansharpening by Background Removal algorithm for sharpening RGB images
py_pansharpening -> multiple algorithms implemented in python
Deep-Learning-PanSharpening -> deep-learning based pan-sharpening code package, we reimplemented include PNN, MSDCNN, PanNet, TFNet, SRPPNN, and our purposed network DIPNet
HyperTransformer -> A Textural and Spectral Feature Fusion Transformer for Pansharpening
DIP-HyperKite -> Hyperspectral Pansharpening Based on Improved Deep Image Prior and Residual Reconstruction
D2TNet -> A ConvLSTM Network with Dual-direction Transfer for Pan-sharpening
PanColorGAN-VHR-Satellite-Images -> Rethinking CNN-Based Pansharpening: Guided Colorization of Panchromatic Images via GANs
MTL_PAN_SEG -> Multi-task deep learning for satellite image pansharpening and segmentation
Z-PNN -> Pansharpening by convolutional neural networks in the full resolution framework
GTP-PNet -> GTP-PNet: A residual learning network based on gradient transformation prior for pansharpening
UDL -> Dynamic Cross Feature Fusion for Remote Sensing Pansharpening
PSData -> A Large-Scale General Pan-sharpening DataSet, which contains PSData3 (QB, GF-2, WV-3) and PSData4 (QB, GF-1, GF-2, WV-2).
AFPN -> Adaptive Detail Injection-Based Feature Pyramid Network For Pan-sharpening
pan-sharpening -> multiple methods demonstrated for multispectral and panchromatic images
PSGan-Family -> PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening
PanNet-Landsat -> A Deep Network Architecture for Pan-Sharpening
DLPan-Toolbox -> Machine Learning in Pansharpening: A Benchmark, from Shallow to Deep Networks
LPPN -> Laplacian pyramid networks: A new approach for multispectral pansharpening
S2_SSC_CNN -> Zero-shot Sentinel-2 Sharpening Using A Symmetric Skipped Connection Convolutional Neural Network
S2S_UCNN -> Sentinel 2 sharpening using a single unsupervised convolutional neural network with MTF-Based degradation model
SSE-Net -> Spatial and Spectral Extraction Network With Adaptive Feature Fusion for Pansharpening
UCGAN -> Unsupervised Cycle-consistent Generative Adversarial Networks for Pan-sharpening
GCPNet -> When Pansharpening Meets Graph Convolution Network and Knowledge Distillation
PanFormer -> PanFormer: a Transformer Based Model for Pan-sharpening
Pansharpening -> Pansformers: Transformer-Based Self-Attention Network for Pansharpening
Sentinel-2 Band Pan-Sharpening
Image-to-image translation
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/translation.png)
Image-to-image translation is a crucial aspect of computer vision that utilizes machine learning models to transform an input image into a new, distinct output image. In the field of remote sensing, it plays a significant role in bridging the gap between different imaging domains, such as converting Synthetic Aperture Radar (SAR) images into RGB (Red Green Blue) images. This technology has a wide range of applications, including improving image quality, filling in missing information, and facilitating cross-domain image analysis and comparison. By leveraging deep learning algorithms, image-to-image translation has become a powerful tool in the arsenal of remote sensing researchers and practitioners. Image source
How to Develop a Pix2Pix GAN for Image-to-Image Translation -> how to develop a Pix2Pix model for translating satellite photographs to Google map images. A good intro to GANS
A growing problem of ‘deepfake geography’: How AI falsifies satellite images
Kaggle Pix2Pix Maps -> dataset for pix2pix to take a google map satellite photo and build a street map
guided-deep-decoder -> With guided deep decoder, you can solve different image pair fusion problems, allowing super-resolution, pansharpening or denoising
hackathon-ci-2020 -> generate nighttime imagery from infrared observations
satellite-to-satellite-translation -> VAE-GAN architecture for unsupervised image-to-image translation with shared spectral reconstruction loss. Model is trained on GOES-16/17 and Himawari-8 L1B data
Pytorch implementation of UNet for converting aerial satellite images into google maps kinda images
Seamless-Satellite-image-Synthesis -> generate abitrarily large RGB images from a map
How to Develop a Pix2Pix GAN for Image-to-Image Translation -> article on machinelearningmastery.com
Satellite-Imagery-to-Map-Translation-using-Pix2Pix-GAN-framework
RSIT_SRM_ISD -> PyTorch implementation of Remote sensing image translation via style-based recalibration module and improved style discriminator
pix2pix_google_maps -> Converts satellite images to map images using pix2pix models
sar2color-igarss2018-chainer -> Image Translation Between Sar and Optical Imagery with Generative Adversarial Nets
HSI2RGB -> Create realistic looking RGB images using remote sensing hyperspectral images
sat_to_map -> Learning mappings to generate city maps images from corresponding satellite images
pix2pix-GANs -> Generate Map using Satellite Image & PyTorch
map-sat -> Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps
Data fusion
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/fusion.png)
Data fusion is a technique for combining information from different sources such as Synthetic Aperture Radar (SAR), optical imagery, and non-imagery data such as Internet of Things (IoT) sensor data. The integration of diverse data sources enables data fusion to overcome the limitations of individual sources, leading to the creation of models that are more accurate and informative than those constructed from a single source. Image source
Awesome-Data-Fusion-for-Remote-Sensing
UDALN_GRSL -> Deep Unsupervised Blind Hyperspectral and Multispectral Data Fusion
CropTypeMapping -> Crop type mapping from optical and radar (Sentinel-1&2) time series using attention-based deep learning
Multimodal-Remote-Sensing-Toolkit -> uses Hyperspectral and LiDAR Data
Aerial-Template-Matching -> development of an algorithm for template Matching on aerial imagery applied to UAV dataset
DDA_UrbanExtraction -> Unsupervised Domain Adaptation for Global Urban Extraction using Sentinel-1 and Sentinel-2 Data
swinstfm -> Remote Sensing Spatiotemporal Fusion using Swin Transformer
LoveCS -> Cross-sensor domain adaptation for high-spatial resolution urban land-cover mapping: from airborne to spaceborne imagery
comingdowntoearth -> Implementation of 'Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization'
Matching between acoustic and satellite images
MapRepair -> Deep Cadastre Maps Alignment and Temporal Inconsistencies Fix in Satellite Images
Compressive-Sensing-and-Deep-Learning-Framework -> Compressive Sensing is used as an initial guess to combine data from multiple sources, with LSTM used to refine the result
DeepSim -> DeepSIM: GPS Spoofing Detection on UAVs using Satellite Imagery Matching
MHF-net -> Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net
Remote_Sensing_Image_Fusion -> Semi-Supervised Remote Sensing Image Fusion Using Multi-Scale Conditional Generative Adversarial network with Siamese Structure
CNNs for Multi-Source Remote Sensing Data Fusion -> Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data
Deep Generative Reflectance Fusion -> Achieving Landsat-like reflectance at any date by fusing Landsat and MODIS surface reflectance with deep generative models
IEEE_TGRS_MDL-RS -> More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification
SSRNET -> SSR-NET: Spatial-Spectral Reconstruction Network for Hyperspectral and Multispectral Image Fusion
cross-view-image-matching -> Bridging the Domain Gap for Ground-to-Aerial Image Matching
CoF-MSMG-PCNN -> Remote Sensing Image Fusion via Boundary Measured Dual-Channel PCNN in Multi-Scale Morphological Gradient Domain
robust_matching_network_on_remote_sensing_imagery_pytorch -> A Robust Matching Network for Gradually Estimating Geometric Transformation on Remote Sensing Imagery
edcstfn -> An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion
ganstfm -> A Flexible Reference-Insensitive Spatiotemporal Fusion Model for Remote Sensing Images Using Conditional Generative Adversarial Network
CMAFF -> Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery
SOLC -> MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification. Uses WHU-OPT-SAR-dataset
MFT -> Multimodal Fusion Transformer for Remote Sensing Image Classification
HSHT-Satellite-Imagery-Synthesis -> Improving Flood Maps by Increasing the Temporal Resolution of Satellites Using Hybrid Sensor Fusion
MDC -> Unsupervised Data Fusion With Deeper Perspective: A Novel Multisensor Deep Clustering Algorithm
FusAtNet -> FusAtNet: Dual Attention based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification
AMM-FuseNet -> Attention-Based Multi-Modal Image Fusion Network for Land Cover Mapping
MANet -> MANet: A Network Architecture for Remote Sensing Spatiotemporal Fusion Based on Multiscale and Attention Mechanisms
deforestation-from-data-fusion -> Fusing Sentinel-1 and Sentinel-2 images for deforestation detection in the Brazilian Amazon under diverse cloud conditions
sct-fusion -> Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification
RSI-MMSegmentation -> GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data
dfc2022-baseline -> baseline solution to the 2022 IEEE GRSS Data Fusion Contest (DFC2022) using TorchGeo, PyTorch Lightning, and Segmentation Models PyTorch to train a U-Net with a ResNet-18 backbone and a loss function of Focal + Dice loss to perform semantic segmentation on the DFC2022 dataset
multiviewRS-models -> List of multi-view fusion learning models proposed for remote sensing (RS) multi-view data
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/gan.png)
Generative networks (e.g. GANs) aim to generate new, synthetic data that appears similar to real-world data. This generated data can be used for a wide range of purposes, including data augmentation, data imbalance correction, and filling in missing or corrupted data. Including generating synthetic data can improve the performance of remote sensing algorithms and models, leading to more accurate and reliable results. Image source
Using Generative Adversarial Networks to Address Scarcity of Geospatial Training Data -> GAN perform better than CNN in segmenting land cover classes outside of the training dataset (article, no code)
Building-A-Nets -> robust building extraction from high-resolution remote sensing images with adversarial networks
GANmapper -> a building footprint generator using Generative Adversarial Networks
CSA-CDGAN -> Channel Self-Attention Based Generative Adversarial Network for Change Detection of Remote Sensing Images
DSGAN -> a conditinal GAN for dynamic precipitation downscaling
MarsGAN -> GAN trained on satellite photos of Mars
HC_ADGAN -> codes for the paper Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification
SCALAE -> Formatting the Landscape: Spatial conditional GAN for varying population in satellite imagery. Method to generate satellite imagery from custom 2D population maps
Satellite-Image-Forgery-Detection-and-Localization
STGAN -> PyTorch Implementation of STGAN for Cloud Removal in Satellite Images
ds-gan-spatiotemporal-evaluation -> evaluating use of deep generative models in remote sensing applications
GAN-based method to generate high-resolution remote sensing for data augmentation and image classification
Remote-Sensing-Image-Generation -> Generate RS Images using Generative Adversarial Networks (GAN)
PSGan-Family -> A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening
Satellite Image Augmetation with GANs -> Image Augmentation for Satellite Images
opt2sar-cyclegan -> Research on SAR image generation method based on non-homologous data
sentinel-cgan -> code for article : Generative adversarial networks in satellite image datasets augmentation
Shoreline_Extraction_GAN -> Shoreline extraction via generative adversarial networks, prediction via LSTMs
Landsat8-Sentinel2-Fusion -> Translating Landsat 8 to Sentinel-2 using a GAN
Seg2Sat -> Seg2Sat explores the potential of diffusion algorithms such as StableDiffusion and ControlNet to generate aerial images based on terrain segmentation data
SAR2Optical -> Transcoding Sentinel-1 SAR to Sentinel-2 using cGAN
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/autoencoder.png)
Autoencoders are a type of neural network that aim to simplify the representation of input data by compressing it into a lower dimensional form. This is achieved through a two-step process of encoding and decoding, where the encoding step compresses the data into a lower dimensional representation, and the decoding step restores the data back to its original form. The goal of this process is to reduce the data's dimensionality, making it easier to store and process, while retaining the essential information. Dimensionality reduction, as the name suggests, refers to the process of reducing the number of dimensions in a dataset. This can be achieved through various techniques such as principal component analysis (PCA) or singular value decomposition (SVD). Autoencoders are one type of neural network that can be used for dimensionality reduction. In the field of computer vision, image embeddings are vector representations of images that capture the most important features of the image. These embeddings can then be used to perform similarity searches, where images are compared based on their features to find similar images. This process can be used in a variety of applications, such as image retrieval, where images are searched based on certain criteria like color, texture, or shape. It can also be used to identify duplicate images in a dataset. Image source
Autoencoders & their Application in Remote Sensing -> intro article and example use case applied to SAR data for land classification
LEt-SNE -> Dimensionality Reduction and visualization technique that compensates for the curse of dimensionality
AutoEncoders for Land Cover Classification of Hyperspectral Images -> An autoencoder nerual net is used to reduce 103 band data to 60 features (dimensionality reduction), keras. Also read part 2 which implements K-NNC, SVM and Gradient Boosting
Image-Similarity-Search -> an app that helps perform super fast image retrieval on PyTorch models for better embedding space interpretability
Interactive-TSNE -> a tool that provides a way to visually view a PyTorch model's feature representation for better embedding space interpretability
RoofNet -> identify roof age using historical satellite images to lower the customer acquisition cost for new solar installations. Uses a VAE: Variational Autoencoder
Visual search over billions of aerial and satellite images -> implemented at Descartes labs
parallax -> Tool for interactive embeddings visualization
Mxnet repository for generating embeddings on satellite images -> Includes sampling of images, mining algorithms, different architectures, error functions, measures for evaluation.
Fine tuning CLIP with Remote Sensing (Satellite) images and captions -> fine tuning CLIP on the RSICD image captioning dataset, to enable querying large catalogues in natural language. With repo , uses 🤗
Image search with 🤗 datasets -> tutorial on fine tuning an image search model
GRN-SNDL -> model the relations between samples (or scenes) by making use of a graph structure which is fed into network learning
SauMoCo -> Deep Unsupervised Embedding for Remotely Sensed Images Based on Spatially Augmented Momentum Contrast
TGRS_RiDe -> Rotation Invariant Deep Embedding for RemoteSensing Images
RaVAEn -> RaVAEn is a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment
Reverse image search using deep discrete feature extraction and locality-sensitive hashing
SNCA_CE -> Deep Metric Learning based on Scalable Neighborhood Components for Remote Sensing Scene Characterization
LandslideDetection-from-satellite-imagery -> Using Attention and Autoencoder boosted CNN
split-brain-remote-sensing -> Analysis of Color Space Quantization in Split-Brain Autoencoder for Remote Sensing Image Classification
image-similarity-measures -> Implementation of eight evaluation metrics to access the similarity between two images. Blog post here
Large_Scale_GeoVisual_Search -> ResNet architecture on UC Merced Land Use Dataset with hamming distance for similarity based search
geobacter -> Generates useful feature embeddings for geospatial locations
Satellite-Image-Segmentation -> the KV-Net model uses this feature of autoencoders to reconnect the disconnected roads
Satellite-Image-Enhancement -> Image enhancement using GAN's and autoencoders
Variational-Autoencoder-For-Satellite-Imagery -> a special VAE to squeeze N images into one single representation with colors segmentating the different objects
DINCAE -> Data-Interpolating Convolutional Auto-Encoder is a neural network to reconstruct missing data in satellite observations
3D_SITS_Clustering -> Unsupervised Satellite Image Time Series Clustering Using Object-Based Approaches and 3D Convolutional Autoencoder
sat_cnn -> Estimating Generalized Measures of Local Neighbourhood Context from Multispectral Satellite Images Using a Convolutional Neural Network. Uses a convolutional autoencoder (CAE)
you-are-here -> You Are Here: Geolocation by Embedding Maps and Images
Tensorflow similarity -> offers state-of-the-art algorithms for metric learning and all the necessary components to research, train, evaluate, and serve similarity-based models
Train SimSiam on Satellite Images using lightly.ai to generate embeddings that can be used for data exploration and understanding
Airbus_SDC_dup -> Project focused on detecting duplicate regions of overlapping satellite imagery. Applied to Airbus ship detection dataset
scale-mae -> Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
Cross-Scale-MAE -> code for paper: Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing
satclip -> A Global, General-Purpose Geographic Location Encoder from Microsoft
Astronaut Photography Localization & Iterative Coregistration
rs-cbir -> Satellite Image Vector Database and Multimodal Search using fine-tuned ResNet50 on AID dataset
TorchSpatial -> A Location Encoding Framework and Benchmark for Spatial Representation Learning
Anomaly detection
Anomaly detection refers to the process of identifying unusual patterns or outliers in satellite or aerial images that do not conform to expected norms. This is crucial in applications such as environmental monitoring, defense surveillance, and urban planning. Machine learning algorithms, particularly unsupervised learning methods, are used to analyze vast amounts of remote sensing data efficiently. These algorithms learn the typical patterns and variations in the data, allowing them to flag anomalies such as unexpected land cover changes, illegal deforestation, or unusual maritime activities. The detection of these anomalies can provide valuable insights for timely decision-making and intervention in various fields.
marine-anomaly-detection -> Semantic segmentation of marine anomalies using semi-supervised learning (FixMatch for semantic segmentation) on Sentinel-2 multispectral images
TDD -> One-Step Detection Paradigm for Hyperspectral Anomaly Detection via Spectral Deviation Relationship Learning
anomaly-detection-in-SAR-imagery -> identify an unknown ship in docks using keras & retinanet
pub-ffi-gan -> Applying generative adversarial networks for anomaly detection in hyperspectral remote sensing imagery
How Airbus Detects Anomalies in ISS Telemetry Data Using TFX -> uses an autoencoder
- AgriSen-COG -> a Multicountry, Multitemporal Large-Scale Sentinel-2 Benchmark Dataset for Crop Mapping: includes an anomaly detection preprocessing step
Image retrieval
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/retrieval.png)
Image retrieval is the task of retrieving images from a collection that are similar to a query image. Image retrieval plays a vital role in remote sensing by enabling the efficient and effective search for relevant images from large image archives, and by providing a way to quantify changes in the environment over time. Image source
Demo_AHCL_for_TGRS2022 -> Asymmetric Hash Code Learning (AHCL) for remote sensing image retrieval
GaLR -> Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information
retrievalSystem -> cross-modal image retrieval system
AMFMN -> Exploring a Fine-grained Multiscale Method for Cross-modal Remote Sensing Image Retrieval
Active-Learning-for-Remote-Sensing-Image-Retrieval -> unofficial implementation of paper: A Novel Active Learning Method in Relevance Feedback for Content-Based Remote Sensing Image Retrieval
CMIR-NET -> A deep learning based model for cross-modal retrieval in remote sensing
Deep-Hash-learning-for-Remote-Sensing-Image-Retrieval -> Deep Hash Learning for Remote Sensing Image Retrieval
MHCLN -> Deep Metric and Hash-Code Learning for Content-Based Retrieval of Remote Sensing Images
HydroViet_VOR -> Object Retrieval in satellite images with Triplet Network
AMFMN -> Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval
remote-sensing-image-retrieval -> Multi-Spectral Remote Sensing Image Retrieval using Geospatial Foundation Models (IBM Prithvi)
Composed Image Retrieval for Remote Sensing
CSMAE -> About Cross-Sensor Masked Autoencoder for Content Based Image Retrieval in Remote Sensing
Image Captioning
![satellite image processing thesis titles satellite image processing thesis titles](https://github.com/satellite-image-deep-learning/techniques/raw/master/images/captioned.png)
Image Captioning is the task of automatically generating a textual description of an image. In remote sensing, image captioning can be used to automatically generate captions for satellite or aerial images, which can be useful for a variety of purposes, such as image search and retrieval, data cataloging, and data dissemination. The generated captions can provide valuable information about the content of the images, including the location, the type of terrain or objects present, and the weather conditions, among others. This information can be used to quickly and easily understand the content of the images, without having to manually examine each image. Image source
awesome-remote-image-captioning -> a list of awesome remote sensing image captioning resources
awesome-vision-language-models-for-earth-observation
CapFormer -> Pure transformer for remote sensing image caption
remote_sensing_image_captioning -> Region Driven Remote Sensing Image Captioning
Remote Sensing Image Captioning with Transformer and Multilabel Classification
Siamese-spatial-Graph-Convolution-Network -> Siamese graph convolutional network for content based remote sensing image retrieval
MLAT -> Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer
WordSent -> Word–Sentence Framework for Remote Sensing Image Captioning
a-mask-guided-transformer-with-topic-token -> A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning
Meta captioning -> A meta learning based remote sensing image captioning framework
Transformer-for-image-captioning -> a transformer for image captioning, trained on the UCM dataset
remote-sensing-image-caption -> image classification and image caption by PyTorch
Fine tuning CLIP with Remote Sensing (Satellite) images and captions -> fine tuning CLIP on the RSICD image captioning dataset, to enable querying large catalogues in natural language. With repo , uses 🤗. Also read Why and How to Fine-tune CLIP
Visual Question Answering
Visual Question Answering (VQA) is the task of automatically answering a natural language question about an image. In remote sensing, VQA enables users to interact with the images and retrieve information using natural language questions. For example, a user could ask a VQA system questions such as "What is the type of land cover in this area?", "What is the dominant crop in this region?" or "What is the size of the city in this image?". The system would then analyze the image and generate an answer based on its understanding of the image content.
VQA-easy2hard -> From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data
lit4rsvqa -> LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing
Change-Agent -> Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Mixed data learning
Mixed data learning is the process of learning from datasets that may contain an mix of images, textual and numeric data. Mixed data learning can help improve the accuracy of models by allowing them to learn from multiple sources at once and use more sophisticated methods to identify patterns and correlations.
Predicting the locations of traffic accidents with satellite imagery and convolutional neural networks -> Combining satellite imagery and structured data to predict the location of traffic accidents with a neural network of neural networks, with repo
Multi-Input Deep Neural Networks with PyTorch-Lightning - Combine Image and Tabular Data -> excellent intro article using pytorch, not actually applied to satellite data but to real estate data, with repo
Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps -> fusion based architectures and coarse-to-fine segmentation to include the OpenStreetMap layer into multispectral-based deep fully convolutional networks, arxiv paper
pyimagesearch article on mixed-data
pytorch-widedeep -> A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
accidentRiskMap -> Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories
Sub-meter resolution canopy height map by Meta -> Satellite Metadata combined with outputs from simple CNN to regress canopy height
This is a class of techniques which attempt to make predictions for classes with few, one or even zero examples provided during training. In zero shot learning (ZSL) the model is assisted by the provision of auxiliary information which typically consists of descriptions/semantic attributes/word embeddings for both the seen and unseen classes at train time ( ref ). These approaches are particularly relevant to remote sensing, where there may be many examples of common classes, but few or even zero examples for other classes of interest.
Aerial-SAM -> Zero-Shot Refinement of Buildings’ Segmentation Models using SAM
FSODM -> Few-shot Object Detection on Remote Sensing Images
Few-Shot Classification of Aerial Scene Images via Meta-Learning -> 2020 publication, a classification model that can quickly adapt to unseen categories using only a few labeled samples
Papers about Few-shot Learning / Meta-Learning on Remote Sensing
SPNet -> Siamese-Prototype Network for Few-Shot Remote Sensing Image Scene Classification
MDL4OW -> Few-Shot Hyperspectral Image Classification With Unknown Classes Using Multitask Deep Learning
P-CNN -> Prototype-CNN for Few-Shot Object Detection in Remote Sensing Images
CIR-FSD-2022 -> Context Information Refinement for Few-Shot Object Detection in Remote Sensing Images
IEEE_TNNLS_Gia-CFSL -> Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification
TIP_2022_CMFSL -> Few-shot Learning with Class-Covariance Metric for Hyperspectral Image Classification
sen12ms-human-few-shot-classifier -> Humans are poor few-shot classifiers for Sentinel-2 land cover
S3Net -> S3Net: Spectral–Spatial Siamese Network for Few-Shot Hyperspectral Image Classification
SiameseNet-for-few-shot-Hyperspectral-Classification -> 3DCSN:SiameseNet-for-few-shot-Hyperspectral-Classification
MESSL -> Multiform Ensemble Self-Supervised Learning for Few-Shot Remote Sensing Scene Classification
SCCNet -> Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation
OEM-Fewshot-Challenge -> OpenEarthMap Land Cover Mapping Few-Shot Challenge Generalized Few-shot Semantic Segmentation
meteor -> a small deep learning meta-model with a single output
SegLand -> Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework. 1st place in the OpenEarthMap Land Cover Mapping Few-Shot Challenge
Self-supervised, unsupervised & contrastive learning are all methods of machine learning that use unlabeled data to train algorithms. Self-supervised learning uses labeled data to create an artificial supervisor, while unsupervised learning uses only the data itself to identify patterns and similarities. Contrastive learning uses pairs of data points to learn representations of data, usually for classification tasks. Note that self-supervised approaches are commonly used in the training of so-called Foundational models, since they enable learning from large quantities of unlablleded data, tyipcally time series.
Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data -> Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of remote sensing representations. Models trained with SeCo achieve better performance than their ImageNet pre-trained counterparts and state-of-the-art self-supervised learning methods on multiple downstream tasks. paper and repo
Unsupervised Learning for Land Cover Classification in Satellite Imagery
Tile2Vec: Unsupervised representation learning for spatially distributed data
Contrastive Sensor Fusion -> Code implementing Contrastive Sensor Fusion, an approach for unsupervised learning of multi-sensor representations targeted at remote sensing imagery
hyperspectral-autoencoders -> Tools for training and using unsupervised autoencoders and supervised deep learning classifiers for hyperspectral data, built on tensorflow. Autoencoders are unsupervised neural networks that are useful for a range of applications such as unsupervised feature learning and dimensionality reduction.
Sentinel-2 image clustering in python
MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification
A generalizable and accessible approach to machine learning with global satellite imagery nature publication -> MOSAIKS is designed to solve an unlimited number of tasks at planet-scale quickly using feature vectors, with repo . Also see mosaiks-api
contrastive-satellite -> Using contrastive learning to create embeddings from optical EuroSAT Satellite-2 imagery
Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding
Self-Supervised-Learner by spaceml-org -> train a classifier with fewer labeled examples needed using self-supervised learning, example applied to UC Merced land use dataset
deepsentinel -> a sentinel-1 and -2 self-supervised sensor fusion model for general purpose semantic embedding
geography-aware-ssl -> uses spatially aligned images over time to construct temporal positive pairs in contrastive learning and geo-location to design pre-text tasks
CNN-Supervised Classification -> Python code for self-supervised classification of remotely sensed imagery - part of the Deep Riverscapes project
clustimage -> a python package for unsupervised clustering of images
LandSurfaceClustering -> Land surface classification using remote sensing data with unsupervised machine learning (k-means)
K-Means Clustering for Surface Segmentation of Satellite Images
Sentinel-2 satellite imagery for crop classification using unsupervised clustering -> label groups of pixels based on temporal trends of their NDVI values
TheColorOutOfSpace -> The color out of space: learning self-supervised representations for Earth Observation imagery, using the BigEarthNet dataset
STEGO -> Unsupervised Semantic Segmentation by Distilling Feature Correspondences, with paper
SoundingEarth -> Self-supervised Audiovisual Representation Learning for Remote Sensing Data, uses the SoundingEarth Dataset
SSLRemoteSensing -> Semantic Segmentation of Remote Sensing Images With Self-Supervised Multitask Representation Learning
CBT -> Continual Barlow Twins: continual self-supervised learning for remote sensing semantic segmentation
Unsupervised Satellite Image Classification based on Partial Adversarial Domain Adaptation -> Code for course project
T2FTS -> Teaching Teachers First and Then Student: Hierarchical Distillation to Improve Long-Tailed Object Recognition in Aerial Images
DINO-MM -> Self-supervised Vision Transformers for Joint SAR-optical Representation Learning
SSL4EO-S12 -> a large-scale dataset for self-supervised learning in Earth observation
SSL4EO-Review -> Self-supervised Learning in Remote Sensing: A Review
transfer_learning_cspt -> Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain
OTL -> Clustering-Based Representation Learning through Output Translation and Its Application to Remote-Sensing Images
vissl_experiments -> Self-supervised Learning using Facebook VISSL on the RESISC-45 satellite imagery classification dataset
MS2A-Net -> MS 2 A-Net: Multi-scale spectral-spatial association network for hyperspectral image clustering
pytorch-ssl-building_extract -> Research on Self-Supervised Building Information Extraction with High-Resolution Remote Sensing Images for Photovoltaic Potential Evaluation
self-rare-wildlife -> Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images
SatMAE -> SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
FireCLR-Wildfires -> Unsupervised Wildfire Change Detection based on Contrastive Learning
FALSE -> False Negative Samples Aware Contrastive Learning for Semantic Segmentation of High-Resolution Remote Sensing Image
MATTER -> Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks
FGMAE -> Feature guided masked Autoencoder for self-supervised learning in remote sensing
GFM -> Towards Geospatial Foundation Models via Continual Pretraining
SatViT -> self-supervised training of multispectral optical and SAR vision transformers
SITS-MoCo -> Self-supervised pre-training for large-scale crop mapping using Sentinel-2 time series
Weakly & semi-supervised learning
Weakly & semi-supervised learning are two methods of machine learning that use both labeled and unlabeled data for training. Weakly supervised learning uses weakly labeled data, which may be incomplete or inaccurate, while semi-supervised learning uses both labeled and unlabeled data. Weakly supervised learning is typically used in situations where labeled data is scarce and unlabeled data is abundant. Semi-supervised learning is typically used in situations where labeled data is abundant but also contains some noise or errors. Both techniques can be used to improve the accuracy of machine learning models by making use of additional data sources.
MARE -> self-supervised Multi-Attention REsu-net for semantic segmentation in remote sensing
SSGF-for-HRRS-scene-classification -> A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification
SFGAN -> Semantic-Fusion Gans for Semi-Supervised Satellite Image Classification
SSDAN -> Multi-Source Semi-Supervised Domain Adaptation Network for Remote Sensing Scene Classification
HR-S2DML -> High-Rankness Regularized Semi-Supervised Deep Metric Learning for Remote Sensing Imagery
Semantic Segmentation of Satellite Images Using Point Supervision
weak-segmentation -> Weakly supervised semantic segmentation for aerial images in pytorch
weakly_supervised -> Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery. Demonstrates that segmentation can be performed using small datasets comprised of pixel or image labels
wan -> Weakly-Supervised Domain Adaptation for Built-up Region Segmentation in Aerial and Satellite Imagery
sourcerer -> A Bayesian-inspired deep learning method for semi-supervised domain adaptation designed for land cover mapping from satellite image time series (SITS)
MSMatch -> Semi-Supervised Multispectral Scene Classification with Few Labels. Includes code to work with both the RGB and the multispectral (MS) versions of EuroSAT dataset and the UC Merced Land Use (UCM) dataset
Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning
Semi-supervised learning in satellite image classification -> experimenting with MixMatch and the EuroSAT data set
ICSS -> Weakly-supervised continual learning for class-incremental segmentation
es-CP -> Semi-Supervised Hyperspectral Image Classification Using a Probabilistic Pseudo-Label Generation Framework
Flood_Mapping_SSL -> Enhancement of Urban Floodwater Mapping From Aerial Imagery With Dense Shadows via Semisupervised Learning
Active learning
Supervised deep learning techniques typically require a huge number of annotated/labelled examples to provide a training dataset. However labelling at scale take significant time, expertise and resources. Active learning techniques aim to reduce the total amount of annotation that needs to be performed by selecting the most useful images to label from a large pool of unlabelled images, thus reducing the time to generate useful training datasets. These processes may be referred to as Human-in-the-Loop Machine Learning
Active learning for object detection in high-resolution satellite images
AIDE V2 - Tools for detecting wildlife in aerial images using active learning
AstronomicAL -> An interactive dashboard for visualisation, integration and classification of data using Active Learning
Follow tutorials for active learning for object detection and segmentation on the lightly platform.
Active-Labeler by spaceml-org -> a CLI Tool that facilitates labeling datasets with just a SINGLE line of code
Labelling platform for Mapping Africa active learning project
ALS4GAN -> Active Learning for Improved Semi Supervised Semantic Segmentation in Satellite Images
DIAL -> DIAL: Deep Interactive and Active Learning for Semantic Segmentation in Remote Sensing
whales -> An active learning pipeline for identifying whales in high-resolution satellite imagery, by Microsoft
AL4EO -> a QGIS plug-in to run Active Learning techniques on Earth observation data
Federated learning
Federated learning is an approach to distributed machine learning where a central processor coordinates the training of an individual model in each of its clients. It is a type of distributed ML which means that the data is distributed among different devices or locations and the model is trained on all of them. The central processor aggregates the model updates from all the clients and then sends the global model parameters back to the clients. This is done to protect the privacy of data, as the data remains on the local device and only the global model parameters are shared with the central processor. This technique can be used to train models with large datasets that cannot be stored in a single device, as well as to enable certain privacy-preserving applications.
Federated-Learning-for-Remote-Sensing -> implementation of three Federated Learning models
Semantic-Segmentation-UNet-Federated -> FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views
MM-FL -> Learning Across Decentralized Multi-Modal Remote Sensing Archives with Federated Learning
Adversarial ML
Efforts to detect falsified images & deepfakes
UAE-RS -> dataset that provides black-box adversarial samples in the remote sensing field
PSGAN -> Perturbation Seeking Generative Adversarial Networks: A Defense Framework for Remote Sensing Image Scene Classification
SACNet -> Self-Attention Context Network: Addressing the Threat of Adversarial Attacks for Hyperspectral Image Classification
Image registration
Image registration is the process of registering one or more images onto another (typically well georeferenced) image. Traditionally this is performed manually by identifying control points (tie-points) in the images, for example using QGIS. This section lists approaches which mostly aim to automate this manual process. There is some overlap with the data fusion section but the distinction I make is that image registration is performed as a prerequisite to downstream processes which will use the registered data as an input.
Wikipedia article on registration -> register for change detection or image stitching
Phase correlation is used to estimate the XY translation between two images with sub-pixel accuracy. Can be used for accurate registration of low resolution imagery onto high resolution imagery, or to register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. With additional pre-processing image rotation and scale changes can also be calculated.
How to Co-Register Temporal Stacks of Satellite Images
image-matching-models -> easily try 23 different image matching methods
ImageRegistration -> Interview assignment for multimodal image registration using SIFT
imreg_dft -> Image registration using discrete Fourier transform. Given two images it can calculate the difference between scale, rotation and position of imaged features.
arosics -> Perform automatic subpixel co-registration of two satellite image datasets using phase-correlation, XY translations only.
SubpixelAlignment -> Implementation of tiff image alignment through phase correlation for pixel- and subpixel-bias
cnn-registration -> A image registration method using convolutional neural network features written in Python2, Tensorflow 1.5
Siamese_ShiftNet -> NN predicting spatial coregistration shift of remote sensing imagery. Adapted from HighRes-net
ImageCoregistration -> Image registration with openCV using sift and RANSAC
mapalignment -> Aligning and Updating Cadaster Maps with Remote Sensing Images
CVPR21-Deep-Lucas-Kanade-Homography -> deep learning pipeline to accurately align challenging multimodality images. The method is based on traditional Lucas-Kanade algorithm with feature maps extracted by deep neural networks.
eolearn implements phase correlation, feature matching and ECC
Reprojecting the Perseverance landing footage onto satellite imagery
Kornia provides image registration
LoFTR -> Detector-Free Local Feature Matching with Transformers. Good performance matching satellite image pairs, tryout the web demo on your data
image-to-db-registration -> This remote module implements an algorithm for automated vector Database registration onto an Image. Implemented in the orfeo-toolbox
MS_HLMO_registration -> Multi-scale Histogram of Local Main Orientation for Remote Sensing Image Registration, with paper
cnn-matching -> Deep learning algorithm for feature matching of cross modality remote sensing images
Imatch-P -> A demo using SuperGlue and SuperPoint to do the image matching task based PaddlePaddle
NBR-Net -> A Non-rigid Bi-directional Registration Network for Multi-temporal Remote Sensing Images
MU-Net -> A Multi-Scale Framework with Unsupervised Learning for Remote Sensing Image Registration
unsupervisedDeepHomographyRAL2018 -> Unsupervised Deep Homography applied to aerial data
registration_cnn_ntg -> A Multispectral Image Registration Method Based on Unsupervised Learning
remote-sensing-images-registration-dataset -> at 0.23m, 3.75m & 30m resolution
semantic-template-matching -> A deep learning semantic template matching framework for remote sensing image registration
GMN-Generative-Matching-Network -> Deep Generative Matching Network for Optical and SAR Image Registration
SOMatch -> A deep learning framework for matching of SAR and optical imagery
Interspectral image registration dataset -> including satellite and drone imagery
RISG-image-matching -> A rotation invariant SuperGlue image matching algorithm
DeepAerialMatching_pytorch -> A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching
DPCN -> Deep Phase Correlation for End-to-End Heterogeneous Sensor Measurements Matching
FSRA -> A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization
IHN -> Iterative Deep Homography Estimation
OSMNet -> Explore Better Network Framework for High-Resolution Optical and SAR Image Matching
L2_Siamese -> Registration of Multiresolution Remote Sensing Images Based on L2-Siamese Model
Multi-Step-Deformable-Registration -> Unsupervised Multi-Step Deformable Registration of Remote Sensing Imagery based on Deep Learning
Terrain mapping, Disparity Estimation, Lidar, DEMs & NeRF
Measure surface contours & locate 3D points in space from 2D images. NeRF stands for Neural Radiance Fields and is the term used in deep learning communities to describe a model that generates views of complex 3D scenes based on a partial set of 2D images
Wikipedia DEM article and phase correlation article
Intro to depth from stereo
Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview
Process of creating a DEM here
ArcGIS can generate DEMs from stereo images
S2P -> S2P is a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos.
Predict the fate of glaciers
monodepth - Unsupervised single image depth prediction with CNNs
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches
Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package
Phase correlation in scikit-image
3DCD -> Inferring 3D change detection from bitemporal optical images
The Mapbox API provides images and elevation maps, article here
Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN
ResDepth -> A Deep Prior For 3D Reconstruction From High-resolution Satellite Images
overhead-geopose-challenge -> competition to build computer vision algorithms that can effectively model the height and pose of ground objects for monocular satellite images taken from oblique angles. Blog post MEET THE WINNERS OF THE OVERHEAD GEOPOSE CHALLENGE
cars -> a dedicated and open source 3D tool to produce Digital Surface Models from satellite imaging by photogrammetry. This Multiview stereo pipeline is intended for massive DSM production with a robust and performant design
ImageToDEM -> Generating Elevation Surface from a Single RGB Remotely Sensed Image Using a U-Net for generator and a PatchGAN for the discriminator
IMELE -> Building Height Estimation from Single-View Aerial Imagery
ridges -> deep semantic segmentation model for identifying ridges in topography
planet_tools -> Selection of imagery from Planet API for creation of stereo elevation models
SatelliteNeRF -> PyTorch-based Neural Radiance Fields adapted to satellite domain
SatelliteSfM -> A library for solving the satellite structure from motion problem
SatelliteSurfaceReconstruction -> 3D Surface Reconstruction From Multi-Date Satellite Images, ISPRS, 2021
son2sat -> A neural network coded in TensorFlow 1 that produces satellite images from acoustic images
aerial_mtl -> PyTorch implementation for multi-task learning with aerial images to learn both semantics and height from aerial image datasets; fuses RGB & lidar
ReKlaSat-3D -> 3D Reconstruction and Classification from Very High Resolution Satellite Imagery
M3Net -> A deep learning method for building height estimation using high-resolution multi-view imagery over urban areas
HMSM-Net -> Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images
StereoMatchingRemoteSensing -> Dual-Scale Matching Network for Disparity Estimation of High-Resolution Remote Sensing Images
satnerf -> Learning Multi-View Satellite Photogrammetry With Transient Objects and Shadow Modeling Using RPC Cameras
SatMVS -> Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching
ImpliCity -> reconstructs digital surface models (DSMs) from raw photogrammetric 3D point clouds and ortho-images with the help of an implicit neural 3D scene representation
WHU-Stereo -> a large-scale dataset for stereo matching of high-resolution satellite imagery & several deep learning methods for stereo matching. Methods include StereoNet, Pyramid Stereo Matching Network & HMSM-Net
Photogrammetry-Guide -> A guide covering Photogrammetry including the applications, libraries and tools that will make you a better and more efficient Photogrammetry development
DSM-to-DTM -> Exploring the use of machine learning to convert a Digital Surface Model (e.g. SRTM) to a Digital Terrain Model
GF-7_Stereo_Matching -> Large Scene DSM Generation of Gaofen-7 Imagery Combined with Deep Learning
Mapping drainage ditches in forested landscapes using deep learning and aerial laser scanning
Thermal Infrared
Thermal infrared remote sensing is a technique used to detect and measure thermal radiation emitted from the Earth’s surface. This technique can be used to measure the temperature of the ground and any objects on it and can detect the presence of different materials. Thermal infrared remote sensing is used to assess land cover, detect land-use changes, and monitor urban heat islands, as well as to measure the temperature of the ground during nighttime or in areas of limited visibility.
The World Needs (a lot) More Thermal Infrared Data from Space
Object_Classification_in_Thermal_Images -> classification accuracy was improved by adding the object size as a feature directly within the CNN
Thermal imaging with satellites blog post by Christoph Rieke
SAR (synthetic aperture radar) is used to detect and measure the properties of objects and surfaces on the Earth's surface. SAR can be used to detect changes in terrain, features, and objects over time, as well as to measure the size, shape, and composition of objects and surfaces. SAR can also be used to measure moisture levels in soil and vegetation, or to detect and monitor changes in land use.
awesome-sar
awesome-sar-deep-learning
MERLIN -> self-supervised training of deep despeckling networks with MERLIN
You do not need clean images for SAR despeckling with deep learning -> How Speckle2Void learned to stop worrying and love the noise
PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python
Synthetic Aperture Radar (SAR) Analysis With Clarifai
Labeled SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode consists of more than 37,000 SAR vignettes divided into ten defined geophysical categories
Deep Learning and SAR Applications
Implementing an Ensemble Convolutional Neural Network on Sentinel-1 Synthetic Aperture Radar data and Sentinel-3 Radiometric data for the detecting of forest fires
s1_parking_occupancy -> PARKING OCCUPANCY ESTIMATION ON SENTINEL-1 IMAGES
Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training
SpaceNet_SAR_Buildings_Solutions -> The winning solutions for the SpaceNet 6 Challenge
Mapping and monitoring of infrastructure in desert regions with Sentinel-1
xView3 is a competition to detect dark vessels using computer vision and global SAR satellite imagery. First place solution and second place solution . Additional places up to fifth place are available at the xView GitHub Organization page
Winners of the STAC Overflow: Map Floodwater from Radar Imagery competition
deSpeckNet-TF-GEE -> deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling
cnn_sar_image_classification -> CNN for classifying SAR images of the Amazon Rainforest
s1_icetype_cnn -> Retrieve sea ice type from Sentinel-1 SAR with CNN
MP-ResNet -> Multi-path Residual Network for the Semantic segmentation of PolSAR Images'
TGRS_DisOptNet -> Distilling Semantic Knowledge from Optical Images for Weather-independent Building Segmentation
SAR_CD_DDNet -> PyTorch implementation of Change Detection in Synthetic Aperture Radar Images Using a Dual Domain Network
SAR_CD_MS_CapsNet -> Change Detection in SAR Images Based on Multiscale Capsule Network
Toushka Waterbodies Segmentation from four different combinations of Sentinel-1 SAR imagery and Digital Elevation Model with Pytorch and U-net. -> code
sar_transformer -> Transformer based SAR image despeckling, trained with synthetic imagery, with paper
SSDD ship detection dataset
Ship Detection on Remote Sensing Synthetic Aperture Radar Data -> based on the architectures of the Faster-RCNN and YOLOv5 networks
Target Recognition in SAR -> Identify Military Vehicles in Satellite Imagery with TensorFlow, with article
DSN -> Deep SAR-Net: Learning objects from signals
SAR_denoising -> project on application of FFDNet to SAR images
cnninsar -> CNN-Based InSAR Denoising and Coherence Metric
sar -> Despeckling Synthetic Aperture Radar Images using a Deep Residual CNN
GCBANet -> A Global Context Boundary-Aware Network for SAR Ship Instance Segmentation
SAR_CD_GKSNet -> Change Detection from Synthetic Aperture Radar Images via Graph-Based Knowledge Supplement Network
pixel-wise-segmentation-of-sar -> Pixel-Wise Segmentation of SAR Imagery Using Encoder-Decoder Network and Fully-Connected CRF
SAR_Ship_detection_CFAR -> An improved two-parameter CFAR algorithm based on Rayleigh distribution and Mathematical Morphology for SAR ship detection
sar_snow_melt_timing -> notebooks and tools to identify snowmelt timing using timeseries analysis of backscatter of Sentinel-1 C-band SAR
Denoising radar satellite images using deep learning in Python -> Medium article on deepdespeckling
random-wetlands -> Random forest classification for wetland vegetation from synthetic aperture radar dataset
AGSDNet -> AGSDNet: Attention and Gradient-Based SAR Denoising Network
LFG-Net -> LFG-Net: Low-Level Feature Guided Network for Precise Ship Instance Segmentation in SAR Images
sar_sift -> Image registration algorithm
SAR-Despeckling -> toolbox
cogsima2022 -> Enhancing land subsidence awareness via InSAR data and Deep Transformers
XAI4SAR-PGIL -> Physically Explainable CNN for SAR Image Classification
PolSARFormer -> Local Window Attention Transformer for Polarimetric SAR Image Classification
DC4Flood -> A deep clustering framework for rapid flood detection using Sentinel-1 SAR imagery
Sentinel1-Flood-Finder -> Flood Finder Package from Sentinel 1 Imagery
NDVI - vegetation index
Normalized Difference Vegetation Index (NDVI) is an index used to measure the amount of healthy vegetation in a given area. It is calculated by taking the difference between the near-infrared (NIR) and red (red) bands of a satellite image, and dividing by the sum of the two bands. NDVI can be used to identify areas of healthy vegetation and to assess the health of vegetation in a given area. ndvi = np.true_divide((ir - r), (ir + r))
Example notebook local
Landsat data in cloud optimised (COG) format analysed for NDVI with medium article here .
Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”
Seeing Through the Clouds - Predicting Vegetation Indices Using SAR
A walkthrough on calculating NDWI water index for flooded areas -> Derive zonal statistics from Sentinel 2 images using Rasterio and Geopandas
NDVI-Net -> NDVI-Net: A fusion network for generating high-resolution normalized difference vegetation index in remote sensing
Awesome-Vegetation-Index
Remote-Sensing-Indices-Derivation-Tool -> Calculate spectral remote sensing indices from satellite imagery
General image quality
Image quality describes the degree of accuracy with which an image can represent the original object. Image quality is typically measured by the amount of detail, sharpness, and contrast that an image contains. Factors that contribute to image quality include the resolution, format, and compression of the image.
lvrnet -> Lightweight Image Restoration for Aerial Images under Low Visibility
jitter-compensation -> Remote Sensing Image Jitter Detection and Compensation Using CNN
DeblurGANv2 -> Deblurring (Orders-of-Magnitude) Faster and Better
image-quality-assessment -> CNN to predict the aesthetic and technical quality of images
piq -> a collection of measures and metrics for image quality assessment
FFA-Net -> Feature Fusion Attention Network for Single Image Dehazing
DeepCalib -> A Deep Learning Approach for Automatic Intrinsic Calibration of Wide Field-of-View Cameras
PerceptualSimilarity -> LPIPS is a perceptual metric which aims to overcome the limitations of traditional metrics such as PSNR & SSIM, to better represent the features the human eye picks up on
Hyperspectral-Deblurring-and-Destriping
HyDe -> Hyperspectral Denoising algorithm toolbox in Python
HLF-DIP -> Unsupervised Hyperspectral Denoising Based on Deep Image Prior and Least Favorable Distribution
RQUNetVAE -> Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising
deep-hs-prior -> Deep Hyperspectral Prior: Denoising, Inpainting, Super-Resolution
iquaflow -> from Satellogic, an image quality framework that aims at providing a set of tools to assess image quality by using the performance of AI models trained on the images as a proxy.
Synthetic data
Training data can be hard to acquire, particularly for rare events such as change detection after disasters, or imagery of rare classes of objects. In these situations, generating synthetic training data might be the only option. This has become quite sophisticated, with 3D models being use with open source games engines such as Unreal .
The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation with repo
RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft. Read the arxiv paper and checkout this repo . Note the dataset is available through the AWS Open-Data Program for free download
Read this article from NVIDIA which discusses fine tuning a model pre-trained on synthetic data (Rareplanes) with 10% real data, then pruning the model to reduce its size, before quantizing the model to improve inference speed
Combining Synthetic Data with Real Data to Improve Detection Results in Satellite Imagery
BlenderGIS could be used for synthetic data generation
bifrost.ai -> simulated data service with geospatial output data formats
oktal-se -> software for generating simulated data across a wide range of bands including optical and SAR
rendered.ai -> The Platform as a Service for Creating Synthetic Data
synthetic_xview_airplanes -> creation of airplanes synthetic dataset using ArcGIS CityEngine
Combining Synthetic Data with Real Data to Improve Detection Results in Satellite Imagery: Case Study
Import OpenStreetMap data into Unreal Engine 4
deepfake-satellite-images -> dataset that includes over 1M images of synthetic aerial images
synthetic-disaster -> Generate synthetic satellite images of natural disasters using deep neural networks
STPLS3D -> A Large-Scale Synthetic and Real Aerial Photogrammetry 3D Point Cloud Dataset
LESS -> LargE-Scale remote sensing data and image Simulation framework over heterogeneous 3D scenes
Synthesizing Robustness: Dataset Size Requirements and Geographic Insights -> Medium article, concludes that synthetic data is most beneficial to the rarest object classes and that extracting utility from synthetic data often takes significant effort and creativity
rs_img_synth -> Synthesizing Optical and SAR Imagery From Land Cover Maps and Auxiliary Raster Data
OnlyPlanes -> dataset and pretrained models for the paper: OnlyPlanes - Incrementally Tuning Synthetic Training Datasets for Satellite Object Detection
awesome-remote-sensing-vision-language-models
Awesome-Remote-Sensing-Multimodal-Large-Language-Model
Remote-Sensing-ChatGPT -> an open source tool for solving remote sensing tasks with ChatGPT in an interactive way.
ChangeCLIP -> ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning
SkyEyeGPT -> SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
RemoteCLIP -> A Vision Language Foundation Model for Remote Sensing
GeoChat -> Grounded Large Vision-Language Model for Remote Sensing
labs-gpt-stac -> connect ChatGPT to a STAC API backend
EarthGPT -> A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
H2RSVLM -> Towards Helpful and Honest Remote Sensing Large Vision Language Model
LLMs & FMs in Smart Agriculture -> Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
LHRS-Bot -> Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Awesome-VLGFM -> Towards Vision-Language Geo-Foundation Models: A Survey
Awesome Remote Sensing Foundation Models
Clay Foundation Model -> an open source AI model and interface for Earth.
TerraTorch -> a Python toolkit for fine-tuning Geospatial Foundation Models from IBM, based on PyTorch Lightning and TorchGeo
EarthPT -> A time series foundation model for Earth Observation
SpectralGPT -> Spectral remote sensing foundation model, with finetuning on classification, segmentation, and change detection tasks
DOFA-pytorch -> Dynamic One-For-All (DOFA) multimodal foundation models for Earth vision reference implementation
Prithvi foundation model -> also see the Baseline Model for Segmentation
prithvi-pytorch -> makes Prithvi usable from Pytorch Lightning
geo-bench -> a General Earth Observation benchmark for evaluating the performances of large pre-trained models on geospatial data
USat -> A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
hydro-foundation-model -> A Foundation Model for Water in Satellite Imagery
RSBuilding -> Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model
Text2Seg -> a pipeline that combined multiple Vision Foundation Models (SAM, CLIP, GroundingDINO) to perform semantic segmentation.
Remote-Sensing-RVSA -> Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
FoMo-Bench -> a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models
MTP -> Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
DiffusionSat -> A Generative Foundation Model For Satellite Imagery
granite-geospatial-biomass -> A geospatial model for Above Ground Biomass from IBM
⭐️ Star History
- Logo created with Brandmark
Releases 22
Sponsor this project, contributors 22.
Downloadable Content
Digital image processing
- Masters Thesis
- Ha, Vinh Thuc
- Wong, Robert
- Bavarian, Behzad
- Electrical and Computer Engineering
- California State University, Northridge
- Dissertations, Academic -- CSUN -- Engineering.
- 2017-04-11T18:16:09Z
- http://hdl.handle.net/10211.3/189576
- by Vinh Thuc Ha
- California State University, Northridge. Department of Engineering.
- Includes bibliographical references (page 59)
![satellite image processing thesis titles California State University, Northridge](https://scholarworks.calstate.edu/assets/logos/northridge-1ac9bf5209fac42f13bf08636210f2bc62a598c68722a2f6d17b24133bdbec94.png)
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
2021-01-16 | Public |
Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
IEEE Account
- Change Username/Password
- Update Address
Purchase Details
- Payment Options
- Order History
- View Purchased Documents
Profile Information
- Communications Preferences
- Profession and Education
- Technical Interests
- US & Canada: +1 800 678 4333
- Worldwide: +1 732 981 0060
- Contact & Support
- About IEEE Xplore
- Accessibility
- Terms of Use
- Nondiscrimination Policy
- Privacy & Opting Out of Cookies
A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.
Satellite Image Processing - Science topic
![satellite image processing thesis titles Polina Lemenkova](https://i1.rgstatic.net/ii/profile.image/11431281214699746-1703751829473_Q64/Polina-Lemenkova-2.jpg)
- Recruit researchers
- Join for free
- Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
Help | Advanced Search
Computer Science > Computer Vision and Pattern Recognition
Title: a systematic review of the use of deep learning in satellite imagery for agriculture.
Abstract: Agricultural research is essential for increasing food production to meet the requirements of an increasing population in the coming decades. Recently, satellite technology has been improving rapidly and deep learning has seen much success in generic computer vision tasks and many application areas which presents an important opportunity to improve analysis of agricultural land. Here we present a systematic review of 150 studies to find the current uses of deep learning on satellite imagery for agricultural research. Although we identify 5 categories of agricultural monitoring tasks, the majority of the research interest is in crop segmentation and yield prediction. We found that, when used, modern deep learning methods consistently outperformed traditional machine learning across most tasks; the only exception was that Long Short-Term Memory (LSTM) Recurrent Neural Networks did not consistently outperform Random Forests (RF) for yield prediction. The reviewed studies have largely adopted methodologies from generic computer vision, except for one major omission: benchmark datasets are not utilised to evaluate models across studies, making it difficult to compare results. Additionally, some studies have specifically utilised the extra spectral resolution available in satellite imagery, but other divergent properties of satellite images - such as the hugely different scales of spatial patterns - are not being taken advantage of in the reviewed studies.
Comments: | 23 pages, 5 figures and 10 tables in main paper. Supplementary materials section also included in main pdf. Update: All tables with specific references have been moved to supplementary. Main text now uses only aggregated information |
Subjects: | Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV) |
Cite as: | [cs.CV] |
(or [cs.CV] for this version) | |
Focus to learn more arXiv-issued DOI via DataCite |
Submission history
Access paper:.
- HTML (experimental)
- Other Formats
![satellite image processing thesis titles license icon](https://arxiv.org/icons/licenses/by-4.0.png)
References & Citations
- Google Scholar
- Semantic Scholar
BibTeX formatted citation
![satellite image processing thesis titles BibSonomy logo](https://arxiv.org/static/browse/0.3.4/images/icons/social/bibsonomy.png)
Bibliographic and Citation Tools
Code, data and media associated with this article, recommenders and search tools.
- Institution
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .
- DSpace@MIT Home
- MIT Libraries
- Doctoral Theses
Learning efficient image processing pipelines
![satellite image processing thesis titles Thumbnail](https://dspace.mit.edu/bitstream/handle/1721.1/120437/1084487535-MIT.pdf.jpg?sequence=3&isAllowed=y)
Other Contributors
Terms of use, description, date issued, collections.
Show Statistical Information
- Bibliography
- More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
- Automated transliteration
- Relevant bibliographies by topics
- Referencing guides
Form Submitted
Www.matlabsimulation.com, satellite image processing projects, related pages.
Matlab Image Processing Projects
Medical Image Processing Projects
Digital Image Processing Projects
Biomedical Image Processing Projects
Microscopic Image Processing Projects
Image Processing Projects Using Matlab
Matlab Based Image Processing Projects
Digital Image Processing Matlab Projects
Digital Image Processing Projects Matlab
Color Conversion Image Processing Projects
Digital Image Processing Projects Matlab Pdf
Currency Recognition Image Processing Projects
Digital Image Processing Projects Using Matlab
Digital Image Processing Using Matlab Projects
Matlab Digital Image Processing Projects Ideas
Research Areas
Digital Image processing
Medical Image processing
Biomedical Signal processing
Biometric Detection
Information Security
Neural Networks
Network Security
Wireless Networks
Wireless Communication
Matlab Simulink
power Electronics
Renewable Energy
Mobile Communication
Satellite Communication
Optical Communication
Video processing
Artificial Intelligence
Advanced Robotics
Related Tools
3D-2D CONVERTER
RAPID MINER
SENTI WORDNET
CLOUD REPORTS
Satellite Image Processing Projects is a classic stand that serves you with an eclectic idea to build up your study skill. Satellite image processing is a grand topic where you can inquiry and progress over the satellite image. In short, it is a crucial study field in image processing where satellite images are inspected. Indeed, it draws a lot of scholars due to its wide range of applications in the real world. Here, we give you some skills and topics to know the efficacy of satellite image processing projects .
Proficiency Over Satellite Imaging
- High spatial resolution and also Hefty coverage
- Terrain and also ocean imaging
- Multispectral and multi temporal
- Evades weather strife
- Clear and also precise images
- Easy to georectify
- And also many more
Topics Under Satellite Image Processing Projects
- Forest carbon flux estimation in Landsat image
- Relative radiometric correction in CCD image
- Image Fusion for object oriented mapping
- Land and also vegetation cover estimation
- Land classification in multi and also hyper spectral image
- Content based registration
![satellite image processing thesis titles Satellite Image Processing Projects Using Matlab Simulink](https://matlabsimulation.com/wp-content/uploads/2020/11/Satellite-Image-Processing-Projects.png)
As like said topics, there be a lot of in satellite-based projects. Right now, plant disease detection is a key study field since food plays a vast part under the human day to day life. Besides, satellite images are also taking on in plant disease finding. Here, images from the satellite are dealing out under image processing steps to detect the disease in it. Let’s have a fast tour on plant disease detected using satellite image,
Plant Diseases Detected Using Satellite Image
- Coffee berry necrosis
- Texas root rot
- Soilborne and also seedborne
- Citrus greening
- Olive tree stress
- And also wheat yellow rust
In satellite imaging, datasets are key cut up to test the images. In general, pick up Satellite Image Processing Projects is not at ease task. Yet, through our aid, you can get great results in your study. Here, we list a few datasets to your knowledge.
Enormous Datasets
- Vegetation Whole Globe Products
- And also LANDSAT
Welfares under Satellite Imaging
- Mineral Deposits Extraction
- Disaster mitigating plan and also Recovering
- Volcano and also Hazard Monitoring
- Agriculture and also biodiversity conversation
- Forestry and also fisheries harvesting
- Seismic risk assessment
In fact, we do not ever watch the clock; instead, we do what it does. Hence, you can get in touch with us any time online/offline. Further, we have more than 15+ years of experience in research. Hence, you can get the sole idea in your dream domain. To sum up, we rate your vision in research; hence we show our aptness by standard and quality.
A life is full of expensive thing ‘TRUST’ Our Promises
100% confidential.
We don’t disclose our client’s data to any third party and it is kept as secret to all.
Fresh Ideas
All the time, you receive only the novel ideas that not release from anywhere.
Work Guarantee
We guarantee for your satisfaction and it is not further need of correction and complaints.
Plagiarism Free
We follow our own writing. It means that without copying from any source, we write it.
We are not intent to sale our product to more than one person. It is totally new for each.
Business Ethics
Surely, we follow our set of ethics to send the high quality products for everyone.
On-Time Delivery
We always deliver your work at the time of delivery. So that feels our timely guidance.
Affordable Price
We are not serving with high cost. You get an excellent work in at reasonable price.
We meet each individual customer’s demands by our years of skills. This is the secret of success.
Great Memories Our Achievements
We received great winning awards for our research awesomeness and it is the mark of our success stories. It shows our key strength and improvements in all research directions.
![satellite image processing thesis titles](https://matlabsimulation.com/wp-content/themes/matlab-simulation/images/achievements-ictact.png)
Our Guidance
- Assignments
- Literature Survey
- Mathematical Proofs
- Research Proposal
- System Development
- Paper Writing
- Conference Paper
- Thesis Writing
- Dissertation Writing
- Hardware Integration
- Paper Publication
24/7 Support, Call Us @ Any Time [email protected] +91 94448 56435
![satellite image processing thesis titles Naval Postgraduate School](https://libapps.s3.amazonaws.com/customers/598/images/nps-logo-header-footer.png)
- NPS Dudley Knox Library
- Research Guides
Satellite Communications
- Articles & Reports
- More Resources
- Satellite Jamming
Space Systems Operations Thesis Research Guide
- Space Systems Operations (SSO) Thesis Research Guide This guide has been prepared for SSO students to assist them in the research and writing of their master’s thesis. It outlines the specific steps necessary to successfully complete a thesis of high quality. It has been written specifically for you, the student. See the SSO Thesis Guide link under the Thesis Documents and Topics section.
Satellite Communication Theses
![satellite image processing thesis titles NPS Archive: Calhoun](https://d2jv02qf7xgjwx.cloudfront.net/customers/598/images/Calhoun.png)
NPS Theses & Dissertations
You will find links to both archives (Public and Restricted).
- NPS Theses Go to this page to access both the NPS Archive: Calhoun (publicly releasable) and the Restricted Collection (Controlled Unclassified Information)
NPS Theses & Dissertations (publicly releasable)
- NPS Archive: Calhoun
Find publicly releasable NPS-authored scholarly works (Theses, Dissertations, Capstone Project Reports, MBA Professional Reports, Joint Applied Project Reports, and other NPS degree-earning written works)
NPS Outstanding Theses (publicly releasable)
- NPS Outstanding Theses
Outstanding Thesis recognitions are given to students who have exhibited exceptional quality in their thesis research work.
- Publicly releasable only
- This list is school-wide, but you can filter by Department using the dropdown menus
NPS Theses & Dissertations (restricted)
![satellite image processing thesis titles restricted lock](https://libapps.s3.amazonaws.com/accounts/29559/images/restricted-lock.png)
NPS-authored scholarly works (NPS Theses, Dissertations, Capstone Project Reports, MBA Professional Reports, Joint Applied Project Reports, and other degree-earning works)
Who can access?
To request an account, you must be both of these:
- Current NPS member
- Federal employee/contractor or U.S. military
If asked to login to Microsoft, use your NPS email address.
Access available to U.S. citizens only
Already have an account?
Search the Restricted Collection
Are you a new user?
Request an Account
- Restricted Resources Restricted Resources is the official archive for restricted NPS Theses, Dissertations, Capstone Project Reports, and other documents.
Non-NPS Theses, Dissertations & Reports
- Defense Technical Information Center (DTIC) (publicly releasable) Publicly releasable documents only in all areas of defense research (science, technology, engineering, social science, business & budget), DOD directives & instructions. DTIC is the central scientific, research, and engineering information support activity for the DOD. more... less... Each product listed on the page must be searched separately. This archive has limited search filtering capabilities. For a better search experience, you can use the National Technical Reports Library or if you are eligible, register and use the DTIC R&E Gateway and use the Advanced Search to filter for "A - Approved for Public Release" documents.
![Password or registration required Password or registration required](https://d2jv02qf7xgjwx.cloudfront.net/sites/609/icons/11962/lock.png)
- E-Theses Online Service (EThOS) UK’s national thesis service for doctoral research theses
- National Technical Reports Library (National Technical Information Service) Scientific, technical, engineering & related business information by or for the U.S. government & complementary materials from international sources. Includes publicly accessible DTIC reports in their Technical Reports Library. NTIS has a more fully functioning search interface.
- Networked Digital Library of Theses and Dissertations (NDLTD) more... less... Website
- OAIster Multidisciplinary, academically-oriented digital resources including theses and dissertations
- Open Access Theses and Dissertations (OATD) Open access graduate theses and dissertations published around the world
- PQDT Open (ProQuest) Open access dissertations and theses
![NPS-Licensed Resource NPS-Licensed Resource](https://d2jv02qf7xgjwx.cloudfront.net/sites/609/icons/11961/nps-logo-header-footer.png)
Chat widget code 60 sec
- << Previous: Books
- Next: More Resources >>
- Last Updated: Jul 22, 2024 11:48 AM
- URL: https://libguides.nps.edu/satcom
![satellite image processing thesis titles satellite image processing thesis titles](https://libapps.s3.amazonaws.com/customers/598/images/dkl-logo-footer.png)
411 Dyer Rd. Bldg. 339 Monterey, CA 93943
(831) 656-2947 DSN 756-2947 |
Start Your Research
- Academic Writing
- Ask a Librarian
- Copyright at NPS
- Graduate Writing Center
- How to Cite
- Library Liaisons
- Research Tools
- Thesis Processing Office
Find & Download
- Databases List
- Articles, Books, & More
- NPS Faculty Publications: Calhoun
- Journal Titles
- Course Reserves
Use the Library
- My Accounts
- Request Article or Book
- Borrow, Renew, Return
- Remote Access
- Workshops & Tours
- For Faculty & Researchers
- For International Students
- Print, Copy, Scan, Fax
- Rooms & Study Spaces
- Computers & Software
- Adapters, Lockers & More
Collections
- Restricted Resources
- Special Collections & Archives
- Federal Depository
- Homeland Security Digital Library
- Library Staff
- Special Exhibits
- Our Affiliates
NPS-Licensed Resources - Terms & Conditions
Copyright Notice
![satellite image processing thesis titles Federal Depository Library](https://libapps.s3.amazonaws.com/customers/598/images/fdlp-emblem-blue.png)
Naval Postgraduate School 1 University Circle, Monterey, CA 93943 Driving Directions | Campus Map
This is an official U.S. Navy Website | Please read our Privacy Policy Notice | FOIA | Section 508 | No FEAR Act | Whistleblower Protection | Copyright and Accessibility | Contact Webmaster
M.Tech/Ph.D Thesis Help in Chandigarh | Thesis Guidance in Chandigarh
![satellite image processing thesis titles satellite image processing thesis titles](https://techsparks.co.in/wp-content/uploads/2021/05/techsparksss.png)
![satellite image processing thesis titles](https://techsparks.co.in/wp-content/uploads/2021/05/phone.png)
+91-9465330425
What is Digital Image Processing?
Digital image processing is the process of using computer algorithms to perform image processing on digital images. Latest topics in digital image processing for research and thesis are based on these algorithms. Being a subcategory of digital signal processing, digital image processing is better and carries many advantages over analog image processing. It permits to apply multiple algorithms to the input data and does not cause the problems such as the build-up of noise and signal distortion while processing. As images are defined over two or more dimensions that make digital image processing “a model of multidimensional systems”. The history of digital image processing dates back to early 1920s when the first application of digital image processing came into news. Many students are going for this field for their m tech thesis as well as for Ph.D. thesis. There are various thesis topics in digital image processing for M.Tech, M.Phil and Ph.D. students. The list of thesis topics in image processing is listed here. Before going into topics in image processing , you should have some basic knowledge of image processing.
![satellite image processing thesis titles image-processing](https://techsparks.co.in/wp-content/uploads/2020/12/image-processing.jpg)
Latest research topics in image processing for research scholars:
- The hybrid classification scheme for plant disease detection in image processing
- The edge detection scheme in image processing using ant and bee colony optimization
- To improve PNLM filtering scheme to denoise MRI images
- The classification method for the brain tumor detection
- The CNN approach for the lung cancer detection in image processing
- The neural network method for the diabetic retinopathy detection
- The copy-move forgery detection approach using textual feature extraction method
- Design face spoof detection method based on eigen feature extraction and classification
- The classification and segmentation method for the number plate detection
- Find the link at the end to download the latest thesis and research topics in Digital Image Processing
Formation of Digital Images
Firstly, the image is captured by a camera using sunlight as the source of energy. For the acquisition of the image, a sensor array is used. These sensors sense the amount of light reflected by the object when light falls on that object. A continuous voltage signal is generated when the data is being sensed. The data collected is converted into a digital format to create digital images. For this process, sampling and quantization methods are applied. This will create a 2-dimensional array of numbers which will be a digital image.
Why is Image Processing Required?
- Image Processing serves the following main purpose:
- Visualization of the hidden objects in the image.
- Enhancement of the image through sharpening and restoration.
- Seek valuable information from the images.
- Measuring different patterns of objects in the image.
- Distinguishing different objects in the image.
Applications of Digital Image Processing
- There are various applications of digital image processing which can also be a good topic for the thesis in image processing. Following are the main applications of image processing:
- Image Processing is used to enhance the image quality through techniques like image sharpening and restoration. The images can be altered to achieve the desired results.
- Digital Image Processing finds its application in the medical field for gamma-ray imaging, PET Scan, X-ray imaging, UV imaging.
- It is used for transmission and encoding.
- It is used in color processing in which processing of colored images is done using different color spaces.
- Image Processing finds its application in machine learning for pattern recognition.
List of topics in image processing for thesis and research
- There are various in digital image processing for thesis and research. Here is the list of latest thesis and research topics in digital image processing:
- Image Acquisition
- Image Enhancement
- Image Restoration
- Color Image Processing
- Wavelets and Multi Resolution Processing
- Compression
- Morphological Processing
- Segmentation
- Representation and Description
- Object recognition
- Knowledge Base
1. Image Acquisition:
Image Acquisition is the first and important step of the digital image of processing . Its style is very simple just like being given an image which is already in digital form and it involves preprocessing such as scaling etc. It starts with the capturing of an image by the sensor (such as a monochrome or color TV camera) and digitized. In case, the output of the camera or sensor is not in digital form then an analog-to-digital converter (ADC) digitizes it. If the image is not properly acquired, then you will not be able to achieve tasks that you want to. Customized hardware is used for advanced image acquisition techniques and methods. 3D image acquisition is one such advanced method image acquisition method. Students can go for this method for their master’s thesis and research.
2. Image Enhancement:
Image enhancement is one of the easiest and the most important areas of digital image processing. The core idea behind image enhancement is to find out information that is obscured or to highlight specific features according to the requirements of an image. Such as changing brightness & contrast etc. Basically, it involves manipulation of an image to get the desired image than original for specific applications. Many algorithms have been designed for the purpose of image enhancement in image processing to change an image’s contrast, brightness, and various other such things. Image Enhancement aims to change the human perception of the images. Image Enhancement techniques are of two types: Spatial domain and Frequency domain.
3. Image Restoration:
Image restoration involves improving the appearance of an image. In comparison to image enhancement which is subjective, image restoration is completely objective which makes the sense that restoration techniques are based on probabilistic or mathematical models of image degradation. Image restoration removes any form of a blur, noise from images to produce a clean and original image. It can be a good choice for the M.Tech thesis on image processing. The image information lost during blurring is restored through a reversal process. This process is different from the image enhancement method. Deconvolution technique is used and is performed in the frequency domain. The main defects that degrade an image are restored here.
4. Color Image Processing:
Color image processing has been proved to be of great interest because of the significant increase in the use of digital images on the Internet. It includes color modeling and processing in a digital domain etc. There are various color models which are used to specify a color using a 3D coordinate system. These models are RGB Model, CMY Model, HSI Model, YIQ Model. The color image processing is done as humans can perceive thousands of colors. There are two areas of color image processing full-color processing and pseudo color processing. In full-color processing, the image is processed in full colors while in pseudo color processing the grayscale images are converted to colored images. It is an interesting topic in image processing.
![satellite image processing thesis titles satellite image processing thesis titles](https://www.techsparks.co.in/dev/wp-content/uploads/2020/12/color-image-processing-2.jpg)
THESIS ON IMAGE PROCESSING
Most of the research scholars to choose their thesis work under the domain of image processing. Operates on images and results in images which improve the visibility of features and to facilitate subsequent analysis. To enhance raw images received from sources such as cameras and sensors. Image processing thesis must depend on the implementation and paper preparation.
Image Processing operations:
- Enhancement of important image particulars by the way suppressing other information’s.
- Image imperfections and defects are corrected.
- Create binary representation.
- Done performance measurements of features.
Applications of Image Processing:
- Forensic Studies.
- Military Applications.
- Document Processing.
- Non-Destructive Evaluation.
- Graphic Arts.
Thesis Topics for Image Processing:
- Content based Image Retrieval
- Intelligent Transportation System
- Satellite Imaging
- Forensic Image Processing
Content based Image Retreival:- Thesis Topics for Image Processing
- Search analyzes contents according to the image not in the meta data such as keywords, tags or description associated with the image
- Contents are may be in the form of shape, color and texture
Recent CBIR issues:-Thesis Topics for Image Processing
- Image Blocking
- Color and Texture Feature Derivation
- Color Indexing
Security:-Thesis Topics for Image Processing
- Security to be maintained by the techniques of cryptography, steganography and watermarking
Intelligent Transportation System:-Thesis Topics for Image Processing
- Video retrieval process to be done onto the intelligent transportation system
Satellite Imaging:-Thesis Topics for Image Processing
- Geographic Information System based concepts are developed under matlab simulation
- Research problems taken into remote sensing input images such as SAR, RADAR, SONAR, ASTER
Forensic Image Processing:-Thesis Topics for Image Processing
- Involves the computer restoration and enhancement of surveillance imagery
- To maximize information extraction from surveillance imagery
Methods of Image Processing:-Thesis Topics for Image Processing
- Adaptive Transformations
- Watermarking
- Sparse Representation
- Dictionary learning
Adaptive Transformation:-Thesis Topics for Image Processing
- Adaptive transformation method works based on classical methods and adaptive methods
- Adaptive method constituents are stepping rule, stopping criterion and threshold estimate
- Performances are measured by bias, precision and efficiency
Watermarking:-Thesis Topics for Image Processing
- Embedding data into an image
- Data must be stored in the actual pixels
- Format may change
Sparse Representation:-Thesis Topics for Image Processing
- Sparse signals can be exactly reconstructed from a small number of elementary signals
- Collaborative sparse coding defined by a single point, it have more than one sparse representation with similar data fitting errors
- Sparse representations are classified into linear representation and non-linear representation
Dictionary Learning:-Thesis Topics for Image Processing
- Dictionary learning is to be solved by a class dependent supervised simultaneous orthogonal matching
- NP-hard but greedy optimizations are possible
- Compute redundant dictionary size which minimizes average non-linear approximation error
MATLAB THESIS PHD Sample Code for Curve Fitting:-
x = [1 2 3 4 5 6]; y = [5.5 43.1 128 290.7 498.4 978.67]; %data p = polyfit(x,y,4) %get the polynomial % Compute the values of the polyfit estimate over a finer range, % and plot the estimate over the real data values for comparison: x2 = 1:.1:6; y2 = polyval(p,x2); plot(x,y,’o’,x2,y2) grid on
MATLAB THESIS PHD Sample Code for Image Retrieval using bag of features:-
Select image features for retrieval: % Display a few of the flower images helperDisplayImageMontage(flowerImageSet.ImageLocation(1:50:1000)); Create bag of Features: % Load pre-trained bagOfFeatures load(‘savedColorBagOfFeatures.mat’,’colorBag’); Index Images: % Create a search index % flowerImageIndex = indexImages(flowerImageSet, colorBag, ‘SaveFeatureLocations’, false); % Load the pre-saved index load(‘savedColorBagOfFeatures.mat’, ‘flowerImageIndex’);
Search similar images: % Define a query image queryImage = read(flowerImageSet, 502);
figure imshow(queryImage)
% Search for the top 20 images with similar color content[imageIDs, scores] = retrieveImages(queryImage, flowerImageIndex);
Research Topics
![satellite image processing thesis titles Biomedical Imaging](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/biomedical_imaging.jpg)
Biomedical Imaging
The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body.
![satellite image processing thesis titles Computer Vision](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/computervision.jpg)
Computer Vision
Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.
![satellite image processing thesis titles Image Segmentation/Classification](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/segmentation.png)
Image Segmentation/Classification
Extracting information from a digital image often depends on first identifying desired objects or breaking down the image into homogenous regions (a process called 'segmentation') and then assigning these objects to particular classes (a process called 'classification'). This is a fundamental part of computer vision, combining image processing and pattern recognition techniques.
![satellite image processing thesis titles Multiresolution Techniques](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/multires.jpg)
Multiresolution Techniques
The VIP lab has a particularly extensive history with multiresolution methods, and a significant number of research students have explored this theme. Multiresolution methods are very broad, essentially meaning than an image or video is modeled, represented, or features extracted on more than one scale, somehow allowing both local and non-local phenomena.
![satellite image processing thesis titles Remote Sensing](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/remotesensing.jpg)
Remote Sensing
Remote sensing, or the science of capturing data of the earth from airplanes or satellites, enables regular monitoring of land, ocean, and atmosphere expanses, representing data that cannot be captured using any other means. A vast amount of information is generated by remote sensing platforms and there is an obvious need to analyze the data accurately and efficiently.
![satellite image processing thesis titles Scientific Imaging](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/sci.jpg)
Scientific Imaging
Scientific Imaging refers to working on two- or three-dimensional imagery taken for a scientific purpose, in most cases acquired either through a microscope or remotely-sensed images taken at a distance.
![satellite image processing thesis titles Stochastic Models](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/stochasticmodels.jpg)
Stochastic Models
In many image processing, computer vision, and pattern recognition applications, there is often a large degree of uncertainty associated with factors such as the appearance of the underlying scene within the acquired data, the location and trajectory of the object of interest, the physical appearance (e.g., size, shape, color, etc.) of the objects being detected, etc.
![satellite image processing thesis titles Video Analysis](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/uploads/images/videoanalysis.png)
Video Analysis
Video analysis is a field within computer vision that involves the automatic interpretation of digital video using computer algorithms. Although humans are readily able to interpret digital video, developing algorithms for the computer to perform the same task has been highly evasive and is now an active research field.
![satellite image processing thesis titles Deep Evolution Figure](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/resize/uploads/images/deep_evolution_figure-100x80.jpg)
Evolutionary Deep Intelligence
Deep learning has shown considerable promise in recent years, producing tremendous results and significantly improving the accuracy of a variety of challenging problems when compared to other machine learning methods.
![satellite image processing thesis titles Discovered Radiomics Sequencer](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/resize/uploads/images/discovered_radiomics_sequencer-100x101.jpg)
Discovery Radiomics
Radiomics, which involves the high-throughput extraction and analysis of a large amount of quantitative features from medical imaging data to characterize tumor phenotype in a quantitative manner, is ushering in a new era of imaging-driven quantitative personalized cancer decision support and management.
![satellite image processing thesis titles Discovered Radiomics Sequencer](https://uwaterloo.ca/vision-image-processing-lab/sites/ca.vision-image-processing-lab/files/resize/uploads/images/hockeyimage-100x101.jpg)
Sports Analytics
Sports Analytics is a growing field in computer vision that analyzes visual cues from images to provide statistical data on players, teams, and games. Want to know how a player's technique improves the quality of the team? Can a team, based on their defensive position, increase their chances to the finals? These are a few out of a plethora of questions that are answered in sports analytics.
![satellite image processing thesis titles Share via Facebook](https://uwaterloo.ca/vision-image-processing-lab/profiles/uw_base_profile/modules/custom/uw_social_media_sharing/images/Facebook.png)
- Contact Waterloo
- Maps & Directions
- Accessibility
The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg, and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations .
![satellite image processing thesis titles USC Viterbi School of Engineering Logo – Viterbi School website](https://viterbischool.usc.edu/wp-content/uploads/2021/04/Main-USC-Viterbi-Logo-480x160-1.png)
Leveling Up Research and Publishing in Geoscience’s Open-Source Era
A new workshop held at usc’s information sciences institute hopes to have a ripple effect throughout the geoscience community..
![satellite image processing thesis titles researchers at a workshop](https://viterbischool.usc.edu/wp-content/uploads/2024/06/1200.png)
Photo Credit: ISI/USC
Ancient rocks from the Southern Ocean; satellite images of the Earth’s atmosphere; soil samples from the Mojave Desert trenches. The 18 geoscientists who gathered for the first FROGS (Facilitating Reproducible Open GeoScience) workshop at USC Viterbi’s Information Sciences Institute (ISI) from June 3-6, 2024 may study a wide and diverse range of data sources, but they came with a common goal: to learn how to level up their research, data sharing, and publishing techniques.
Participants like Jhon Mojica, a senior researcher at the University of Miami working with NOAA (National Oceanic and Atmospheric Administration), got an introduction to Scientific Python and R programming languages. Mojica said, “I’m leading projects on the expansion of Port Everglades and water quality around South Florida. Learning to use Python to automate data processing will make our studies more robust and efficient.”
They were taught methods such as spectral analysis to interpret environmental variability over different timescales. Pranaykumar Tirpude, a Ph.D. student at the University of Delaware whose research involves studying 1.4 million years of data from the Southern Ocean said, “Implementing these techniques will help me better understand climate cycles and the stability of ice sheets over geological time.”
Setting sail with PyRATES
The workshop was hosted by LinkedEarth , an initiative that brings together AI and paleoclimate research to create a cohesive understanding of historical climate data by revolutionizing the way data is managed and analyzed.
This particular curriculum, dubbed PyRATES (Python and R Analysis of Time SerieS), catered to researchers with little to no experience in computer programming languages (i.e., Python and R) and was led by Deborah Khider , paleoclimatologist and Research Scientist at ISI; Julien Emile-Geay , Professor of Earth Sciences at USC Dornsife College of Letters, Arts & Sciences ; and from Northern Arizona University’s School of Earth and Sustainability, Associate Professor Nick McKay and Data Scientist David Edge .
Khider, Emile-Geay, McKay and Edge set out with a lofty goal – “to elevate participants’ research to the next level by equipping them with advanced techniques that were previously beyond their reach, and helping them apply these methods to a broader array of datasets than previously thought possible.” Khider explained, “With PyRATES, we wanted participants to come out with the skills to do their science using open source libraries while sharing their own project openly and in a reproducible manner.”
Embracing open science with FAIR publishing
In addition to the advanced research methods, FROGS participants were taught best practices for FAIR science publishing – the principles of making research Findable, Accessible, Interoperable and Reusable. Through hands-on sessions, they learned techniques for data versioning, managing metadata, using open data repositories, and applying appropriate licensing. These skills enhance reproducibility and collaboration, ensuring data and findings are easily shared, accessed, and usable by others in the scientific community, meeting the evolving standards of journals and funding agencies.
Kathryn Chen, a biological oceanographer at the Scripps Institution of Oceanography, highlighted the workshop’s emphasis on this, “I am currently drafting my first paper, so I particularly appreciated the FAIR publishing aspect. Learning about versioning datasets, code, and workflows has been instrumental for my research.”
In the scientists’ own words…
The workshop drew participants from various fields within geosciences, each bringing unique perspectives and gaining invaluable insights.
Among them was Dannielle Fougere, a fifth-year Ph.D. student in the Earth Sciences department at USC. Fougere, a paleoseismologist, is focused on understanding the behavior of the Garlock Fault in the Mojave Desert. Her research involves calculating slip rates to determine how fast the fault has been moving over time, which is crucial for understanding seismic activity in the region. Fougere explained, “ Paleoseismology is not very quantitative. With techniques learned here, I’d like to add a component in my thesis that’s a bit more quantitative to solidify things for editors and reviewers.”
Victor Olawoyin, a Ph.D. student in earthquake seismology at Boston College, found the workshop particularly beneficial for learning time series analysis and improving the reproducibility of his research. “The time series analysis is crucial for interpreting seismograms. The publishing part was also really cool, as it helps in releasing data and software with better workflow,” Olawoyin said. He plans to apply these new skills directly to his Ph.D. thesis and future research projects.
Venkataramana Sridhar, a faculty member from Virginia Tech specializing in hydrology, climate change, and water resources, saw the workshop as a way to enhance both his research and teaching. “This program covered a wide range of topics from publishing to data analytics, all crucial for my research on how climate change impacts hydrology and water resources. The skills and insights gained here will be invaluable in both my classroom teachings and research endeavors,” he stated.
Sreedevi Puthiyamadam Vasu, a Ph.D. student in Atmospheric Science at the Florida Institute of Technology, found the training perfectly tailored to her research needs. “My work on seasonal and sub-seasonal prediction of precipitation relies heavily on time series analysis. The focus on Python and R was exactly what I needed to transition from proprietary software like MATLAB to open-source tools, enhancing my ability to contribute to open science.”
Building a collaborative future
The workshop not only equipped participants with new technical skills but also fostered a sense of community and collaboration. The interdisciplinary nature of the event brought fresh perspectives to geoscientific challenges, inspiring innovative approaches and potential collaborations.
As these researchers return to their respective sub-fields equipped with new capabilities in open-source programming and FAIR principles, the hope is that the effects of this workshop will influence the broader geoscience community. The commitment to open science and reproducibility promises to drive forward the quality and impact of future geoscientific research.
The LinkedEarth team plans to continue supporting the geoscience community with similar training opportunities, fostering an ecosystem where scientific discoveries are not only made but shared openly and efficiently.
Published on July 15th, 2024
Last updated on July 15th, 2024
Share this Story
Related Stories
![satellite image processing thesis titles Digital transformation concept. System engineering. Binary code. Programming.](https://viterbischool.usc.edu/wp-content/uploads/2022/09/iStock-1321462048-300x150.jpeg)
ABOUT THE SCHOOL
- 115 Year Celebration
- About Andrew Viterbi
- Diversity Equity & Inclusion
- Facts and Numbers
- Faculty Directory
- Ginsburg Hall
- USC Michelson Center
FROM THE DEAN
- Dean's Message
- Dean's Report
- Initiatives and Priorities
- Engineering +
- Strategic Plan
NEWS | MEDIA | EVENTS
- Keynote Lecture Series
- Media Contact & Press Releases
- Media Coverage
- Public Image Archive
- Publications
- Social Media
- Viterbi News Now
SCHOOL OF ADVANCED COMPUTING
- Thomas Lord Department of Computer Science
- Ming Hsieh Department of Electrical and Computer Engineering
- Division of Computing Education (DCE)
- Information Technology Program (ITP)
- Interdisciplinary Data Science (IDS)
- Information Science Institute (ISI)
- Institute for Creative Technologies (ICT)
- More to come soon
DEPARTMENTS AND ACADEMIC PROGRAMS
- Aerospace and Mechanical Engineering
- Astronautical Engineering
- Alfred E. Mann Department of Biomedical Engineering
- Mork Family Department of Chemical Engineering and Materials Science
- Sonny Astani Department of Civil and Environmental Engineering
- Daniel J. Epstein Department of Industrial and Systems Engineering
- Engineering in Society Program
- Information Technology Program
EXECUTIVE AND CONTINUING EDUCATION
- Aviation Safety and Security Program
- Corporate and Professional Programs
ONLINE ACCESS
- Graduate Programs - DEN@Viterbi
SPECIALIZED GRADUATE PROGRAMS
- Financial Engineering Program
- Green Technologies Program
- Data Science Program
- Progressive Degree Program
- Systems Architecting and Engineering Program
RESOURCES AND INITIATIVES
- Academic Integrity
- Accreditation
- Awards Office
- John Brooks Slaughter Center for Engineering Diversity
- Division of Engineering Education
- Globalization
- K-12 Outreach
- USC Experts Directory
- Women in Science and Engineering
FIRST YEAR APPLICANTS
![satellite image processing thesis titles satellite image processing thesis titles](https://viterbischool.usc.edu/wp-content/uploads/2021/04/Mega-Menu-Thumbnails-Admission-First-Year.jpg)
MASTER'S APPLICANTS
![satellite image processing thesis titles satellite image processing thesis titles](https://viterbischool.usc.edu/wp-content/uploads/2021/04/Mega-Menu-Masters-Thumbnail.jpg)
PHD APPLICANTS
![satellite image processing thesis titles satellite image processing thesis titles](https://viterbischool.usc.edu/wp-content/uploads/2021/04/Mega-Menu-Thumbnails-Admission-PHD.jpg)
TRANSFER APPLICANTS
![satellite image processing thesis titles satellite image processing thesis titles](https://viterbischool.usc.edu/wp-content/uploads/2023/11/Mega-Menu-Transfer-Applicants.jpg)
RESEARCH ENVIRONMENT
- Search Faculty Research Areas
- Departments, Research Institutes and Centers
- Research Infrastructure
- Research Initiatives
- Research Vision
- Student Research
- Summer Undergraduate Research Experience
TECHNOLOGY INNOVATION AND ENTREPRENEURSHIP
- NSF I-Corps Hub: West Region
- Office of Technology Innovation and Entrepreneurship
- USC Stevens Center for Innovation
- Viterbi News Network
- Diversity Equity Inclusion
- Dean’s Message
- Dean’s Report
- Media Contact & Press Releases
- More to Come Soon
- Biomedical Engineering
- Informatics Program
- Graduate Programs – DEN@Viterbi
- First Year Applicants
- Master’s Applicants
- PHD Applicants
- Transfer Applicants
- Competitions
- Entrepreneurship
- I-Corps Node
- Viterbi Startup Garage
- Viterbi Student Innovation Institute (VSI2)
- Viterbi Venture Fund
![](http://myjudaica.online/777/templates/cheerup2/res/banner1.gif)
IMAGES
VIDEO
COMMENTS
A list of completed theses and new thesis topics from the Computer Vision Group. ... Deconvolution is an important image processing step in improving the quality of microscopy images for removing out-of-focus light, higher resolution, and beter signal to noise ratio. ... Satellite images have many applications, e.g. in meteorology, geography ...
Satellite Image Processing with ... Methods and Visual Attention Md Ibne Sina Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of ... Dr. Alberto Chávez-Aragón for all those interesting discussions on diverse types of topics. I owe my deepest gratitude to my lovely ...
The good results—as high as AP = 0.937 and AR = 0.959 —from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation. 1. Introduction. Despite substantial advances in global human well-being, the world continues to experience humanitarian crizes and natural disasters.
Historical maps classification has become an important application in today's scenario of everchanging land boundaries. Historical map changes include the change in boundaries of cities/states, vegetation regions, water bodies and so forth. Change detection in these regions are mainly carried out via satellite images. Hence, an extensive knowledge on satellite image processing is necessary ...
Deep learning has revolutionized the analysis and interpretation of satellite and aerial imagery, addressing unique challenges such as vast image sizes and a wide array of object classes. This repository provides an exhaustive overview of deep learning techniques specifically tailored for satellite and aerial image processing.
Hence, an extensive knowledge on satellite image processing is necessary for historical map classification applications. An exhaustive analysis on the merits and demerits of many satellite image ...
Satellite images speci cally can be modi ed in a number of ways, such as inserting objects into an image to hide existing scenes and structures. In this thesis, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forg-eries within satellite images.
Oulu University of Applied Sciences Information Technology, Internet Services. Author: Hung Dao Title of the bachelor's thesis: Image Classification Using Convolutional Neural Networks Supervisor: Jukka Jauhiainen Term and year of completion: Spring 2020 Number of pages: 31. The objective of this thesis was to study the application of deep ...
1 Chapter 1: Introduction 1.1 SAR Image Synthetic Aperture Radar (SAR) is a sort of active remote sensing technology invented in the 1950s by a team of Goodyear Aircraft Company experts.
The applications of image processing techniques in the first category were used widely in improving digitized pictures sent by satellite, or captured by X-ray cameras, TV cameras and other digitizers. This graduate project introduces and shows how to apply image processing techniques to process a digitized image on personal computer system.
Satellite image processing plays a vital role for research and developments in Astronomy, Remote Sensing, GIS, Agriculture Monitoring, Disaster Management and many other fields of study. Satellite images are recorded in digital forms and then processed by the computers to extract information. Variations in the scene characteristics are represented as variations in brightness on images. A ...
List of dissertations / theses on the topic 'Satellite Imagery Processing'. Scholarly publications with full text pdf download. Related research topic ideas.
Alexander Lau. This paper presents an image processing-based method for analyzing satellite scans taken from 24 airports and runways different in size and location to detect and understand ...
A systematic review of the use of Deep Learning in Satellite Imagery for Agriculture. Brandon Victor, Zhen He, Aiden Nibali. Agricultural research is essential for increasing food production to meet the requirements of an increasing population in the coming decades. Recently, satellite technology has been improving rapidly and deep learning has ...
List of dissertations / theses on the topic 'Satellite image analysis'. Scholarly publications with full text pdf download. Related research topic ideas.
The high resolution of modern cameras puts significant performance pressure on image processing pipelines. Tuning the parameters of these pipelines for speed is subject to stringent image quality constraints and requires significant efforts from skilled programmers. Because quality is driven by perceptual factors with which most quantitative ...
Relevant bibliographies by topics / Satellite Image Processing / Dissertations / Theses. Dissertations / Theses on the topic 'Satellite Image Processing' To see the other types of publications on this topic, follow the link: Satellite Image Processing. Author: Grafiati.
Relative radiometric correction in CCD image. Image Fusion for object oriented mapping. Land and also vegetation cover estimation. Land classification in multi and also hyper spectral image. Content based registration. And also many more. As like said topics, there be a lot of in satellite-based projects.
This guide has been prepared for SSO students to assist them in the research and writing of their master's thesis. It outlines the specific steps necessary to successfully complete a thesis of high quality. It has been written specifically for you, the student. See the SSO Thesis Guide link under the Thesis Documents and Topics section.
Latest research topics in image processing for research scholars: The hybrid classification scheme for plant disease detection in image processing. The edge detection scheme in image processing using ant and bee colony optimization. To improve PNLM filtering scheme to denoise MRI images. The classification method for the brain tumor detection.
To enhance raw images received from sources such as cameras and sensors. Image processing thesis must depend on the implementation and paper preparation. Image Processing operations: Enhancement of important image particulars by the way suppressing other information's. Image imperfections and defects are corrected.
Computer Vision. Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.
Ancient rocks from the Southern Ocean; satellite images of the Earth's atmosphere; soil samples from the Mojave Desert trenches. The 18 geoscientists who gathered for the first FROGS (Facilitating Reproducible Open GeoScience) workshop at USC Viterbi's Information Sciences Institute (ISI) from June 3-6, 2024 may study a wide and diverse range of data sources, but they came with a common ...