A list of completed theses and new thesis topics from the Computer Vision Group.

Are you about to start a BSc or MSc thesis? Please read our instructions for preparing and delivering your work.

Below we list possible thesis topics for Bachelor and Master students in the areas of Computer Vision, Machine Learning, Deep Learning and Pattern Recognition. The project descriptions leave plenty of room for your own ideas. If you would like to discuss a topic in detail, please contact the supervisor listed below and Prof. Paolo Favaro to schedule a meeting. Note that for MSc students in Computer Science it is required that the official advisor is a professor in CS.

AI deconvolution of light microscopy images

Level: master.

Background Light microscopy became an indispensable tool in life sciences research. Deconvolution is an important image processing step in improving the quality of microscopy images for removing out-of-focus light, higher resolution, and beter signal to noise ratio. Currently classical deconvolution methods, such as regularisation or blind deconvolution, are implemented in numerous commercial software packages and widely used in research. Recently AI deconvolution algorithms have been introduced and being currently actively developed, as they showed a high application potential.

Aim Adaptation of available AI algorithms for deconvolution of microscopy images. Validation of these methods against state-of-the -art commercially available deconvolution software.

Material and Methods Student will implement and further develop available AI deconvolution methods and acquire test microscopy images of different modalities. Performance of developed AI algorithms will be validated against available commercial deconvolution software.

satellite image processing thesis titles

  • Al algorithm development and implementation: 50%.
  • Data acquisition: 10%.
  • Comparison of performance: 40 %.

Requirements

  • Interest in imaging.
  • Solid knowledge of AI.
  • Good programming skills.

Supervisors Paolo Favaro, Guillaume Witz, Yury Belyaev.

Institutes Computer Vison Group, Digital Science Lab, Microscopy imaging Center.

Contact Yury Belyaev, Microscopy imaging Center, [email protected] , + 41 78 899 0110.

Instance segmentation of cryo-ET images

Level: bachelor/master.

In the 1600s, a pioneering Dutch scientist named Antonie van Leeuwenhoek embarked on a remarkable journey that would forever transform our understanding of the natural world. Armed with a simple yet ingenious invention, the light microscope, he delved into uncharted territory, peering through its lens to reveal the hidden wonders of microscopic structures. Fast forward to today, where cryo-electron tomography (cryo-ET) has emerged as a groundbreaking technique, allowing researchers to study proteins within their natural cellular environments. Proteins, functioning as vital nano-machines, play crucial roles in life and understanding their localization and interactions is key to both basic research and disease comprehension. However, cryo-ET images pose challenges due to inherent noise and a scarcity of annotated data for training deep learning models.

satellite image processing thesis titles

Credit: S. Albert et al./PNAS (CC BY 4.0)

To address these challenges, this project aims to develop a self-supervised pipeline utilizing diffusion models for instance segmentation in cryo-ET images. By leveraging the power of diffusion models, which iteratively diffuse information to capture underlying patterns, the pipeline aims to refine and accurately segment cryo-ET images. Self-supervised learning, which relies on unlabeled data, reduces the dependence on extensive manual annotations. Successful implementation of this pipeline could revolutionize the field of structural biology, facilitating the analysis of protein distribution and organization within cellular contexts. Moreover, it has the potential to alleviate the limitations posed by limited annotated data, enabling more efficient extraction of valuable information from cryo-ET images and advancing biomedical applications by enhancing our understanding of protein behavior.

Methods The segmentation pipeline for cryo-electron tomography (cryo-ET) images consists of two stages: training a diffusion model for image generation and training an instance segmentation U-Net using synthetic and real segmentation masks.

    1. Diffusion Model Training:         a. Data Collection: Collect and curate cryo-ET image datasets from the EMPIAR             database (https://www.ebi.ac.uk/empiar/).         b. Architecture Design: Select an appropriate architecture for the diffusion model.         c. Model Evaluation: Cryo-ET experts will help assess image quality and fidelity             through visual inspection and quantitative measures     2. Building the Segmentation dataset:         a. Synthetic and real mask generation: Use the trained diffusion model to generate             synthetic cryo-ET images. The diffusion process will be seeded from either a real             or a synthetic segmentation mask. This will yield to pairs of cryo-ET images and             segmentation masks.     3. Instance Segmentation U-Net Training:         a. Architecture Design: Choose an appropriate instance segmentation U-Net             architecture.         b. Model Evaluation: Evaluate the trained U-Net using precision, recall, and F1             score metrics.

By combining the diffusion model for cryo-ET image generation and the instance segmentation U-Net, this pipeline provides an efficient and accurate approach to segment structures in cryo-ET images, facilitating further analysis and interpretation.

References     1. Kwon, Diana. "The secret lives of cells-as never seen before." Nature 598.7882 (2021):         558-560.     2. Moebel, Emmanuel, et al. "Deep learning improves macromolecule identification in 3D         cellular cryo-electron tomograms." Nature methods 18.11 (2021): 1386-1394.     3. Rice, Gavin, et al. "TomoTwin: generalized 3D localization of macromolecules in         cryo-electron tomograms with structural data mining." Nature Methods (2023): 1-10.

Contacts Prof. Thomas Lemmin Institute of Biochemistry and Molecular Medicine Bühlstrasse 28, 3012 Bern ( [email protected] )

Prof. Paolo Favaro Institute of Computer Science Neubrückstrasse 10 3012 Bern ( [email protected] )

Adding and removing multiple sclerosis lesions with to imaging with diffusion networks

Background multiple sclerosis lesions are the result of demyelination: they appear as dark spots on t1 weighted mri imaging and as bright spots on flair mri imaging.  image analysis for ms patients requires both the accurate detection of new and enhancing lesions, and the assessment of  atrophy via local thickness and/or volume changes in the cortex.  detection of new and growing lesions is possible using deep learning, but made difficult by the relative lack of training data: meanwhile cortical morphometry can be affected by the presence of lesions, meaning that removing lesions prior to morphometry may be more robust.  existing ‘lesion filling’ methods are rather crude, yielding unrealistic-appearing brains where the borders of the removed lesions are clearly visible., aim: denoising diffusion networks are the current gold standard in mri image generation [1]: we aim to leverage this technology to remove and add lesions to existing mri images.  this will allow us to create realistic synthetic mri images for training and validating ms lesion segmentation algorithms, and for investigating the sensitivity of morphometry software to the presence of ms lesions at a variety of lesion load levels., materials and methods: a large, annotated, heterogeneous dataset of mri data from ms patients, as well as images of healthy controls without white matter lesions, will be available for developing the method.  the student will work in a research group with a long track record in applying deep learning methods to neuroimaging data, as well as experience training denoising diffusion networks..

Nature of the Thesis:

Literature review: 10%

Replication of Blob Loss paper: 10%

Implementation of the sliding window metrics:10%

Training on MS lesion segmentation task: 30%

Extension to other datasets: 20%

Results analysis: 20%

Fig. Results of an existing lesion filling algorithm, showing inadequate performance

Requirements:

Interest/Experience with image processing

Python programming knowledge (Pytorch bonus)

Interest in neuroimaging

Supervisor(s):

PD. Dr. Richard McKinley

Institutes: Diagnostic and Interventional Neuroradiology

Center for Artificial Intelligence in Medicine (CAIM), University of Bern

References: [1] Brain Imaging Generation with Latent Diffusion Models , Pinaya et al, Accepted in the Deep Generative Models workshop @ MICCAI 2022 , https://arxiv.org/abs/2209.07162

Contact : PD Dr Richard McKinley, Support Centre for Advanced Neuroimaging ( [email protected] )

Improving metrics and loss functions for targets with imbalanced size: sliding window Dice coefficient and loss.

Background The Dice coefficient is the most commonly used metric for segmentation quality in medical imaging, and a differentiable version of the coefficient is often used as a loss function, in particular for small target classes such as multiple sclerosis lesions.  Dice coefficient has the benefit that it is applicable in instances where the target class is in the minority (for example, in case of segmenting small lesions).  However, if lesion sizes are mixed, the loss and metric is biased towards performance on large lesions, leading smaller lesions to be missed and harming overall lesion detection.  A recently proposed loss function (blob loss[1]) aims to combat this by treating each connected component of a lesion mask separately, and claims improvements over Dice loss on lesion detection scores in a variety of tasks.

Aim: The aim of this thesisis twofold.  First, to benchmark blob loss against a simple, potentially superior loss for instance detection: sliding window Dice loss, in which the Dice loss is calculated over a sliding window across the area/volume of the medical image.  Second, we will investigate whether a sliding window Dice coefficient is better corellated with lesion-wise detection metrics than Dice coefficient and may serve as an alternative metric capturing both global and instance-wise detection.

Materials and Methods: A large, annotated, heterogeneous dataset of MRI data from MS patients will be available for benchmarking the method, as well as our existing codebases for MS lesion segmentation.  Extension of the method to other diseases and datasets (such as covered in the blob loss paper) will make the method more plausible for publication.  The student will work alongside clinicians and engineers carrying out research in multiple sclerosis lesion segmentation, in particular in the context of our running project supported by the CAIM grant.

satellite image processing thesis titles

Fig. An  annotated MS lesion case, showing the variety of lesion sizes

References: [1] blob loss: instance imbalance aware loss functions for semantic segmentation, Kofler et al, https://arxiv.org/abs/2205.08209

Idempotent and partial skull-stripping in multispectral MRI imaging

Background Skull stripping (or brain extraction) refers to the masking of non-brain tissue from structural MRI imaging.  Since 3D MRI sequences allow reconstruction of facial features, many data providers supply data only after skull-stripping, making this a vital tool in data sharing.  Furthermore, skull-stripping is an important pre-processing step in many neuroimaging pipelines, even in the deep-learning era: while many methods could now operate on data with skull present, they have been trained only on skull-stripped data and therefore produce spurious results on data with the skull present.

High-quality skull-stripping algorithms based on deep learning are now widely available: the most prominent example is HD-BET [1].  A major downside of HD-BET is its behaviour on datasets to which skull-stripping has already been applied: in this case the algorithm falsely identifies brain tissue as skull and masks it.  A skull-stripping algorithm F not exhibiting this behaviour would  be idempotent: F(F(x)) = F(x) for any image x.  Furthermore, legacy datasets from before the availability of high-quality skull-stripping algorithms may still contain images which have been inadequately skull-stripped: currently the only solution to improve the skull-stripping on this data is to go back to the original datasource or to manually correct the skull-stripping, which is time-consuming and prone to error. 

Aim: In this project, the student will develop an idempotent skull-stripping network which can also handle partially skull-stripped inputs.  In the best case, the network will operate well on a large subset of the data we work with (e.g. structural MRI, diffusion-weighted MRI, Perfusion-weighted MRI,  susceptibility-weighted MRI, at a variety of field strengths) to maximize the future applicability of the network across the teams in our group.

Materials and Methods: Multiple datasets, both publicly available and internal (encompassing thousands of 3D volumes) will be available. Silver standard reference data for standard sequences at 1.5T and 3T can be generated using existing tools such as HD-BET: for other sequences and field strengths semi-supervised learning or methods improving robustness to domain shift may be employed.  Robustness to partial skull-stripping may be induced by a combination of learning theory and model-based approaches.

satellite image processing thesis titles

Dataset curation: 10%

Idempotent skull-stripping model building: 30%

Modelling of partial skull-stripping:10%

Extension of model to handle partial skull: 30%

Results analysis: 10%

Fig. An example of failed skull-stripping requiring manual correction

References: [1] Isensee, F, Schell, M, Pflueger, I, et al. Automated brain extraction of multisequence MRI using artificial neural networks. Hum Brain Mapp . 2019; 40: 4952– 4964. https://doi.org/10.1002/hbm.24750

Automated leaf detection and leaf area estimation (for Arabidopsis thaliana)

Correlating plant phenotypes such as leaf area or number of leaves to the genotype (i.e. changes in DNA) is a common goal for plant breeders and molecular biologists. Such data can not only help to understand fundamental processes in nature, but also can help to improve ecotypes, e.g., to perform better under climate change, or reduce fertiliser input. However, collecting data for many plants is very time consuming and automated data acquisition is necessary.

The project aims at building a machine learning model to automatically detect plants in top-view images (see examples below), segment their leaves (see Fig C) and to estimate the leaf area. This information will then be used to determine the leaf area of different Arabidopsis ecotypes. The project will be carried out in collaboration with researchers of the Institute of Plant Sciences at the University of Bern. It will also involve the design and creation of a dataset of plant top-views with the corresponding annotation (provided by experts at the Institute of Plant Sciences).

satellite image processing thesis titles

Contact: Prof. Dr. Paolo Favaro ( [email protected] )

Master Projects at the ARTORG Center

The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Assessment of Digital Biomarkers at Home by Radar.  [PDF] Comparison of Radar, Seismograph and Ballistocardiography and to Monitor Sleep at Home.   [PDF] Sentimental Analysis in Speech.  [PDF] Contact: Dr. Stephan Gerber ( [email protected] )

Internship in Computational Imaging at Prophesee

A 6 month intership at Prophesee, Grenoble is offered to a talented Master Student.

The topic of the internship is working on burst imaging following the work of Sam Hasinoff , and exploring ways to improve it using event-based vision.

A compensation to cover the expenses of living in Grenoble is offered. Only students that have legal rights to work in France can apply.

Anyone interested can send an email with the CV to Daniele Perrone ( [email protected] ).

Using machine learning applied to wearables to predict mental health

This Master’s project lies at the intersection of psychiatry and computer science and aims to use machine learning techniques to improve health. Using sensors to detect sleep and waking behavior has as of yet unexplored potential to reveal insights into health.  In this study, we make use of a watch-like device, called an actigraph, which tracks motion to quantify sleep behavior and waking activity. Participants in the study consist of healthy and depressed adolescents and wear actigraphs for a year during which time we query their mental health status monthly using online questionnaires.  For this masters thesis we aim to make use of machine learning methods to predict mental health based on the data from the actigraph. The ability to predict mental health crises based on sleep and wake behavior would provide an opportunity for intervention, significantly impacting the lives of patients and their families. This Masters thesis is a collaboration between Professor Paolo Favaro at the Institute of Computer Science ( [email protected] ) and Dr Leila Tarokh at the Universitäre Psychiatrische Dienste (UPD) ( [email protected] ).  We are looking for a highly motivated individual interested in bridging disciplines. 

Bachelor or Master Projects at the ARTORG Center

The Gerontechnology and Rehabilitation group at the ARTORG Center for Biomedical Engineering is offering multiple BSc- and MSc thesis projects to students, which are interested in working with real patient data, artificial intelligence and machine learning algorithms. The goal of these projects is to transfer the findings to the clinic in order to solve today’s healthcare problems and thus to improve the quality of life of patients. Machine Learning Based Gait-Parameter Extraction by Using Simple Rangefinder Technology.  [PDF] Detection of Motion in Video Recordings   [PDF] Home-Monitoring of Elderly by Radar  [PDF] Gait feature detection in Parkinson's Disease  [PDF] Development of an arthroscopic training device using virtual reality  [PDF] Contact: Dr. Stephan Gerber ( [email protected] ), Michael Single ( [email protected]. ch )

Dynamic Transformer

Level: bachelor.

Visual Transformers have obtained state of the art classification accuracies [ViT, DeiT, T2T, BoTNet]. Mixture of experts could be used to increase the capacity of a neural network by learning instance dependent execution pathways in a network [MoE]. In this research project we aim to push the transformers to their limit and combine their dynamic attention with MoEs, compared to Switch Transformer [Switch], we will use a much more efficient formulation of mixing [CondConv, DynamicConv] and we will use this idea in the attention part of the transformer, not the fully connected layer.

  • Input dependent attention kernel generation for better transformer layers.

Publication Opportunity: Dynamic Neural Networks Meets Computer Vision (a CVPR 2021 Workshop)

Extensions:

  • The same idea could be extended to other ViT/Transformer based models [DETR, SETR, LSTR, TrackFormer, BERT]

Related Papers:

  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT]
  • DeiT: Data-efficient Image Transformers [DeiT]
  • Bottleneck Transformers for Visual Recognition [BoTNet]
  • Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT]
  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [MoE]
  • Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity [Switch]
  • CondConv: Conditionally Parameterized Convolutions for Efficient Inference [CondConv]
  • Dynamic Convolution: Attention over Convolution Kernels [DynamicConv]
  • End-to-End Object Detection with Transformers [DETR]
  • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR]
  • End-to-end Lane Shape Prediction with Transformers [LSTR]
  • TrackFormer: Multi-Object Tracking with Transformers [TrackFormer]
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT]

Contact: Sepehr Sameni

Visual Transformers have obtained state of the art classification accuracies for 2d images[ViT, DeiT, T2T, BoTNet]. In this project, we aim to extend the same ideas to 3d data (videos), which requires a more efficient attention mechanism [Performer, Axial, Linformer]. In order to accelerate the training process, we could use [Multigrid] technique.

  • Better video understanding by attention blocks.

Publication Opportunity: LOVEU (a CVPR workshop) , Holistic Video Understanding (a CVPR workshop) , ActivityNet (a CVPR workshop)

  • Rethinking Attention with Performers [Performer]
  • Axial Attention in Multidimensional Transformers [Axial]
  • Linformer: Self-Attention with Linear Complexity [Linformer]
  • A Multigrid Method for Efficiently Training Video Models [Multigrid]

GIRAFFE is a newly introduced GAN that can generate scenes via composition with minimal supervision [GIRAFFE]. Generative methods can implicitly learn interpretable representation as can be seen in GAN image interpretations [GANSpace, GanLatentDiscovery]. Decoding GIRAFFE could give us per-object interpretable representations that could be used for scene manipulation, data augmentation, scene understanding, semantic segmentation, pose estimation [iNeRF], and more. 

In order to invert a GIRAFFE model, we will first train the generative model on Clevr and CompCars datasets, then we add a decoder to the pipeline and train this autoencoder. We can make the task easier by knowing the number of objects in the scene and/or knowing their positions. 

Goals:  

Scene Manipulation and Decomposition by Inverting the GIRAFFE 

Publication Opportunity:  DynaVis 2021 (a CVPR workshop on Dynamic Scene Reconstruction)  

Related Papers: 

  • GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [GIRAFFE] 
  • Neural Scene Graphs for Dynamic Scenes 
  • pixelNeRF: Neural Radiance Fields from One or Few Images [pixelNeRF] 
  • NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis [NeRF] 
  • Neural Volume Rendering: NeRF And Beyond 
  • GANSpace: Discovering Interpretable GAN Controls [GANSpace] 
  • Unsupervised Discovery of Interpretable Directions in the GAN Latent Space [GanLatentDiscovery] 
  • Inverting Neural Radiance Fields for Pose Estimation [iNeRF] 

Quantized ViT

Visual Transformers have obtained state of the art classification accuracies [ViT, CLIP, DeiT], but the best ViT models are extremely compute heavy and running them even only for inference (not doing backpropagation) is expensive. Running transformers cheaply by quantization is not a new problem and it has been tackled before for BERT [BERT] in NLP [Q-BERT, Q8BERT, TernaryBERT, BinaryBERT]. In this project we will be trying to quantize pretrained ViT models. 

Quantizing ViT models for faster inference and smaller models without losing accuracy 

Publication Opportunity:  Binary Networks for Computer Vision 2021 (a CVPR workshop)  

Extensions:  

  • Having a fast pipeline for image inference with ViT will allow us to dig deep into the attention of ViT and analyze it, we might be able to prune some attention heads or replace them with static patterns (like local convolution or dilated patterns), We might be even able to replace the transformer with performer and increase the throughput even more [Performer]. 
  • The same idea could be extended to other ViT based models [DETR, SETR, LSTR, TrackFormer, CPTR, BoTNet, T2TViT] 
  • Learning Transferable Visual Models From Natural Language Supervision [CLIP] 
  • Visual Transformers: Token-based Image Representation and Processing for Computer Vision [ViT] 
  • DeiT: Data-efficient Image Transformers [DeiT] 
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [BERT] 
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT [Q-BERT] 
  • Q8BERT: Quantized 8Bit BERT [Q8BERT] 
  • TernaryBERT: Distillation-aware Ultra-low Bit BERT [TernaryBERT] 
  • BinaryBERT: Pushing the Limit of BERT Quantization [BinaryBERT] 
  • Rethinking Attention with Performers [Performer] 
  • End-to-End Object Detection with Transformers [DETR] 
  • Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [SETR] 
  • End-to-end Lane Shape Prediction with Transformers [LSTR] 
  • TrackFormer: Multi-Object Tracking with Transformers [TrackFormer] 
  • CPTR: Full Transformer Network for Image Captioning [CPTR] 
  • Bottleneck Transformers for Visual Recognition [BoTNet] 
  • Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet [T2TViT] 

Multimodal Contrastive Learning

Recently contrastive learning has gained a lot of attention for self-supervised image representation learning [SimCLR, MoCo]. Contrastive learning could be extended to multimodal data, like videos (images and audio) [CMC, CoCLR]. Most contrastive methods require large batch sizes (or large memory pools) which makes them expensive for training. In this project we are going to use non batch size dependent contrastive methods [SwAV, BYOL, SimSiam] to train multimodal representation extractors. 

Our main goal is to compare the proposed method with the CMC baseline, so we will be working with STL10, ImageNet, UCF101, HMDB51, and NYU Depth-V2 datasets. 

Inspired by the recent works on smaller datasets [ConVIRT, CPD], to accelerate the training speed, we could start with two pretrained single-modal models and finetune them with the proposed method.  

  • Extending SwAV to multimodal datasets 
  • Grasping a better understanding of the BYOL 

Publication Opportunity:  MULA 2021 (a CVPR workshop on Multimodal Learning and Applications)  

  • Most knowledge distillation methods for contrastive learners also use large batch sizes (or memory pools) [CRD, SEED], the proposed method could be extended for knowledge distillation. 
  • One could easily extend this idea to multiview learning, for example one could have two different networks working on the same input and train them with contrastive learning, this may lead to better models [DeiT] by cross-model inductive biases communications. 
  • Self-supervised Co-training for Video Representation Learning [CoCLR] 
  • Learning Spatiotemporal Features via Video and Text Pair Discrimination [CPD] 
  • Audio-Visual Instance Discrimination with Cross-Modal Agreement [AVID-CMA] 
  • Self-Supervised Learning by Cross-Modal Audio-Video Clustering [XDC] 
  • Contrastive Multiview Coding [CPC] 
  • Contrastive Learning of Medical Visual Representations from Paired Images and Text [ConVIRT] 
  • A Simple Framework for Contrastive Learning of Visual Representations [SimCLR] 
  • Momentum Contrast for Unsupervised Visual Representation Learning [MoCo] 
  • Bootstrap your own latent: A new approach to self-supervised Learning [BYOL] 
  • Exploring Simple Siamese Representation Learning [SimSiam] 
  • Unsupervised Learning of Visual Features by Contrasting Cluster Assignments [SwAV] 
  • Contrastive Representation Distillation [CRD] 
  • SEED: Self-supervised Distillation For Visual Representation [SEED] 

Robustness of Neural Networks

Neural Networks have been found to achieve surprising performance in several tasks such as classification, detection and segmentation. However, they are also very sensitive to small (controlled) changes to the input. It has been shown that some changes to an image that are not visible to the naked eye may lead the network to output an incorrect label. This thesis will focus on studying recent progress in this area and aim to build a procedure for a trained network to self-assess its reliability in classification or one of the popular computer vision tasks.

Contact: Paolo Favaro

Masters projects at sitem center

The Personalised Medicine Research Group at the sitem Center for Translational Medicine and Biomedical Entrepreneurship is offering multiple MSc thesis projects to the biomed eng MSc students that may also be of interest to the computer science students. Automated quantification of cartilage quality for hip treatment decision support.  PDF Automated quantification of massive rotator cuff tears from MRI. PDF Deep learning-based segmentation and fat fraction analysis of the shoulder muscles using quantitative MRI. PDF Unsupervised Domain Adaption for Cross-Modality Hip Joint Segmentation. PDF Contact:  Dr. Kate Gerber

Internships/Master thesis @ Chronocam

3-6 months internships on event-based computer vision. Chronocam is a rapidly growing startup developing event-based technology, with more than 15 PhDs working on problems like tracking, detection, classification, SLAM, etc. Event-based computer vision has the potential to solve many long-standing problems in traditional computer vision, and this is a super exciting time as this potential is becoming more and more tangible in many real-world applications. For next year we are looking for motivated Master and PhD students with good software engineering skills (C++ and/or python), and preferable good computer vision and deep learning background. PhD internships will be more research focused and possibly lead to a publication.  For each intern we offer a compensation to cover the expenses of living in Paris.  List of some of the topics we want to explore:

  • Photo-realistic image synthesis and super-resolution from event-based data (PhD)
  • Self-supervised representation learning (PhD)
  • End-to-end Feature Learning for Event-based Data
  • Bio-inspired Filtering using Spiking Networks
  • On-the fly Compression of Event-based Streams for Low-Power IoT Cameras
  • Tracking of Multiple Objects with a Dual-Frequency Tracker
  • Event-based Autofocus
  • Stabilizing an Event-based Stream using an IMU
  • Crowd Monitoring for Low-power IoT Cameras
  • Road Extraction from an Event-based Camera Mounted in a Car for Autonomous Driving
  • Sign detection from an Event-based Camera Mounted in a Car for Autonomous Driving
  • High-frequency Eye Tracking

Email with attached CV to Daniele Perrone at  [email protected] .

Contact: Daniele Perrone

Object Detection in 3D Point Clouds

Today we have many 3D scanning techniques that allow us to capture the shape and appearance of objects. It is easier than ever to scan real 3D objects and transform them into a digital model for further processing, such as modeling, rendering or animation. However, the output of a 3D scanner is often a raw point cloud with little to no annotations. The unstructured nature of the point cloud representation makes it difficult for processing, e.g. surface reconstruction. One application is the detection and segmentation of an object of interest.  In this project, the student is challenged to design a system that takes a point cloud (a 3D scan) as input and outputs the names of objects contained in the scan. This output can then be used to eliminate outliers or points that belong to the background. The approach involves collecting a large dataset of 3D scans and training a neural network on it.

Contact: Adrian Wälchli

Shape Reconstruction from a Single RGB Image or Depth Map

A photograph accurately captures the world in a moment of time and from a specific perspective. Since it is a projection of the 3D space to a 2D image plane, the depth information is lost. Is it possible to restore it, given only a single photograph? In general, the answer is no. This problem is ill-posed, meaning that many different plausible depth maps exist, and there is no way of telling which one is the correct one.  However, if we cover one of our eyes, we are still able to recognize objects and estimate how far away they are. This motivates the exploration of an approach where prior knowledge can be leveraged to reduce the ill-posedness of the problem. Such a prior could be learned by a deep neural network, trained with many images and depth maps.

CNN Based Deblurring on Mobile

Deblurring finds many applications in our everyday life. It is particularly useful when taking pictures on handheld devices (e.g. smartphones) where camera shake can degrade important details. Therefore, it is desired to have a good deblurring algorithm implemented directly in the device.  In this project, the student will implement and optimize a state-of-the-art deblurring method based on a deep neural network for deployment on mobile phones (Android).  The goal is to reduce the number of network weights in order to reduce the memory footprint while preserving the quality of the deblurred images. The result will be a camera app that automatically deblurs the pictures, giving the user a choice of keeping the original or the deblurred image.

Depth from Blur

If an object in front of the camera or the camera itself moves while the aperture is open, the region of motion becomes blurred because the incoming light is accumulated in different positions across the sensor. If there is camera motion, there is also parallax. Thus, a motion blurred image contains depth information.  In this project, the student will tackle the problem of recovering a depth-map from a motion-blurred image. This includes the collection of a large dataset of blurred- and sharp images or videos using a pair or triplet of GoPro action cameras. Two cameras will be used in stereo to estimate the depth map, and the third captures the blurred frames. This data is then used to train a convolutional neural network that will predict the depth map from the blurry image.

Unsupervised Clustering Based on Pretext Tasks

The idea of this project is that we have two types of neural networks that work together: There is one network A that assigns images to k clusters and k (simple) networks of type B perform a self-supervised task on those clusters. The goal of all the networks is to make the k networks of type B perform well on the task. The assumption is that clustering in semantically similar groups will help the networks of type B to perform well. This could be done on the MNIST dataset with B being linear classifiers and the task being rotation prediction.

Adversarial Data-Augmentation

The student designs a data augmentation network that transforms training images in such a way that image realism is preserved (e.g. with a constrained spatial transformer network) and the transformed images are more difficult to classify (trained via adversarial loss against an image classifier). The model will be evaluated for different data settings (especially in the low data regime), for example on the MNIST and CIFAR datasets.

Unsupervised Learning of Lip-reading from Videos

People with sensory impairment (hearing, speech, vision) depend heavily on assistive technologies to communicate and navigate in everyday life. The mass production of media content today makes it impossible to manually translate everything into a common language for assistive technologies, e.g. captions or sign language.  In this project, the student employs a neural network to learn a representation for lip-movement in videos in an unsupervised fashion, possibly with an encoder-decoder structure where the decoder reconstructs the audio signal. This requires collecting a large dataset of videos (e.g. from YouTube) of speakers or conversations where lip movement is visible. The outcome will be a neural network that learns an audio-visual representation of lip movement in videos, which can then be leveraged to generate captions for hearing impaired persons.

Learning to Generate Topographic Maps from Satellite Images

Satellite images have many applications, e.g. in meteorology, geography, education, cartography and warfare. They are an accurate and detailed depiction of the surface of the earth from above. Although it is relatively simple to collect many satellite images in an automated way, challenges arise when processing them for use in navigation and cartography. The idea of this project is to automatically convert an arbitrary satellite image, of e.g. a city, to a map of simple 2D shapes (streets, houses, forests) and label them with colors (semantic segmentation). The student will collect a dataset of satellite image and topological maps and train a deep neural network that learns to map from one domain to the other. The data could be obtained from a Google Maps database or similar.

Optimization of OmniMotion, a tracking algorithm

Martí farré farrús · june 2024.

This thesis presents Quasi-OmniFastTrack, an improved version of the OmniMotion algorithm for long-term pixel tracking in videos. The key contribution is reducing the computational expense and training time of OmniMotion while maintaining comparable tracking performance. The main bottleneck in OmniMotion was identified to be the NeRF network used for 3D scene representation. Quasi-OmniFastTrack replaces this with a pre-trained depth estimation model, significantly reducing training time, based on the work introduced in OmniFastTrack, hence the name. The invertible neural network for mapping between local and canonical coordinates is retained, but optimized depths are used to lift 2D pixels to 3D. Experiments show that Quasi-OmniFastTrack reduces training time by over 50% compared to OmniMotion while achieving similar qualitative tracking results on sequences with occlusions. Performance degrades somewhat on fast-moving scenes. The ablation studies demonstrate the importance of optimizing the initial depth estimates during training. While not matching OmniMotion's robustness in all scenarios, Quasi-OmniFastTrack offers a compelling speed-accuracy tradeoff, enabling long-term tracking on more videos in practical timeframes. Future work on incorporating other modifications introduced in OmniFastTrack, like long-term semantic features, could further improve tracking consistency.

New Variables of Brain Morphometry: the Potential and Limitations of CNN Regression

Timo blattner · sept. 2022.

The calculation of variables of brain morphology is computationally very expensive and time-consuming. A previous work showed the feasibility of ex- tracting the variables directly from T1-weighted brain MRI images using a con- volutional neural network. We used significantly more data and extended their model to a new set of neuromorphological variables, which could become inter- esting biomarkers in the future for the diagnosis of brain diseases. The model shows for nearly all subjects a less than 5% mean relative absolute error. This high relative accuracy can be attributed to the low morphological variance be- tween subjects and the ability of the model to predict the cortical atrophy age trend. The model however fails to capture all the variance in the data and shows large regional differences. We attribute these limitations in part to the moderate to poor reliability of the ground truth generated by FreeSurfer. We further investigated the effects of training data size and model complexity on this regression task and found that the size of the dataset had a significant impact on performance, while deeper models did not perform better. Lack of interpretability and dependence on a silver ground truth are the main drawbacks of this direct regression approach.

Home Monitoring by Radar

Lars ziegler · sept. 2022.

Detection and tracking of humans via UWB radars is a promising and continuously evolving field with great potential for medical technology. This contactless method of acquiring data of a patients movement patterns is ideal for in home application. As irregularities in a patients movement patterns are an indicator for various health problems including neurodegenerative diseases, the insight this data could provide may enable earlier detection of such problems. In this thesis a signal processing pipeline is presented with which a persons movement is modeled. During an experiment 142 measurements were recorded by two separate radar systems and one lidar system which each consisted of multiple sensors. The models that were calculated on these measurements by the signal processing pipeline were used to predict the times when a person stood up or sat down. The predictions showed an accuracy of 72.2%.

Revisiting non-learning based 3D reconstruction from multiple images

Aaron sägesser · oct. 2021.

Arthroscopy consists of challenging tasks and requires skills that even today, young surgeons still train directly throughout the surgery. Existing simulators are expensive and rarely available. Through the growing potential of virtual reality(VR) (head-mounted) devices for simulation and their applicability in the medical context, these devices have become a promising alternative that would be orders of magnitude cheaper and could be made widely available. To build a VR-based training device for arthroscopy is the overall aim of our project, as this would be of great benefit and might even be applicable in other minimally invasive surgery (MIS). This thesis marks a first step of the project with its focus to explore and compare well-known algorithms in a multi-view stereo (MVS) based 3D reconstruction with respect to imagery acquired by an arthroscopic camera. Simultaneously with this reconstruction, we aim to gain essential measures to compare the VR environment to the real world, as validation of the realism of future VR tasks. We evaluate 3 different feature extraction algorithms with 3 different matching techniques and 2 different algorithms for the estimation of the fundamental (F) matrix. The evaluation of these 18 different setups is made with a reconstruction pipeline embedded in a jupyter notebook implemented in python based on common computer vision libraries and compared with imagery generated with a mobile phone as well as with the reconstruction results of state-of-the-art (SOTA) structure-from-motion (SfM) software COLMAP and Multi-View Environment (MVE). Our comparative analysis manifests the challenges of heavy distortion, the fish-eye shape and weak image quality of arthroscopic imagery, as all results are substantially worse using this data. However, there are huge differences regarding the different setups. Scale Invariant Feature Transform (SIFT) and Oriented FAST Rotated BRIEF (ORB) in combination with k-Nearest Neighbour (kNN) matching and Least Median of Squares (LMedS) present the most promising results. Overall, the 3D reconstruction pipeline is a useful tool to foster the process of gaining measurements from the arthroscopic exploration device and to complement the comparative research in this context.

Examination of Unsupervised Representation Learning by Predicting Image Rotations

Eric lagger · sept. 2020.

In recent years deep convolutional neural networks achieved a lot of progress. To train such a network a lot of data is required and in supervised learning algorithms it is necessary that the data is labeled. To label data there is a lot of human work needed and this takes a lot of time and money to be done. To avoid the inconveniences that come with this we would like to find systems that don’t need labeled data and therefore are unsupervised learning algorithms. This is the importance of unsupervised algorithms, even though their outcome is not yet on the same qualitative level as supervised algorithms. In this thesis we will discuss an approach of such a system and compare the results to other papers. A deep convolutional neural network is trained to learn the rotations that have been applied to a picture. So we take a large amount of images and apply some simple rotations and the task of the network is to discover in which direction the image has been rotated. The data doesn’t need to be labeled to any category or anything else. As long as all the pictures are upside down we hope to find some high dimensional patterns for the network to learn.

StitchNet: Image Stitching using Autoencoders and Deep Convolutional Neural Networks

Maurice rupp · sept. 2019.

This thesis explores the prospect of artificial neural networks for image processing tasks. More specifically, it aims to achieve the goal of stitching multiple overlapping images to form a bigger, panoramic picture. Until now, this task is solely approached with ”classical”, hardcoded algorithms while deep learning is at most used for specific subtasks. This thesis introduces a novel end-to-end neural network approach to image stitching called StitchNet, which uses a pre-trained autoencoder and deep convolutional networks. Additionally to presenting several new datasets for the task of supervised image stitching with each 120’000 training and 5’000 validation samples, this thesis also conducts various experiments with different kinds of existing networks designed for image superresolution and image segmentation adapted to the task of image stitching. StitchNet outperforms most of the adapted networks in both quantitative as well as qualitative results.

Facial Expression Recognition in the Wild

Luca rolshoven · sept. 2019.

The idea of inferring the emotional state of a subject by looking at their face is nothing new. Neither is the idea of automating this process using computers. Researchers used to computationally extract handcrafted features from face images that had proven themselves to be effective and then used machine learning techniques to classify the facial expressions using these features. Recently, there has been a trend towards using deeplearning and especially Convolutional Neural Networks (CNNs) for the classification of these facial expressions. Researchers were able to achieve good results on images that were taken in laboratories under the same or at least similar conditions. However, these models do not perform very well on more arbitrary face images with different head poses and illumination. This thesis aims to show the challenges of Facial Expression Recognition (FER) in this wild setting. It presents the currently used datasets and the present state-of-the-art results on one of the biggest facial expression datasets currently available. The contributions of this thesis are twofold. Firstly, I analyze three famous neural network architectures and their effectiveness on the classification of facial expressions. Secondly, I present two modifications of one of these networks that lead to the proposed STN-COV model. While this model does not outperform all of the current state-of-the-art models, it does beat several ones of them.

A Study of 3D Reconstruction of Varying Objects with Deformable Parts Models

Raoul grossenbacher · july 2019.

This work covers a new approach to 3D reconstruction. In traditional 3D reconstruction one uses multiple images of the same object to calculate a 3D model by taking information gained from the differences between the images, like camera position, illumination of the images, rotation of the object and so on, to compute a point cloud representing the object. The characteristic trait shared by all these approaches is that one can almost change everything about the image, but it is not possible to change the object itself, because one needs to find correspondences between the images. To be able to use different instances of the same object, we used a 3D DPM model that can find different parts of an object in an image, thereby detecting the correspondences between the different pictures, which we then can use to calculate the 3D model. To take this theory to practise, we gave a 3D DPM model, which was trained to detect cars, pictures of different car brands, where no pair of images showed the same vehicle and used the detected correspondences and the Factorization Method to compute the 3D point cloud. This technique leads to a completely new approach in 3D reconstruction, because changing the object itself was never done before.

Motion deblurring in the wild replication and improvements

Alvaro juan lahiguera · jan. 2019, coma outcome prediction with convolutional neural networks, stefan jonas · oct. 2018, automatic correction of self-introduced errors in source code, sven kellenberger · aug. 2018, neural face transfer: training a deep neural network to face-swap, till nikolaus schnabel · july 2018.

This thesis explores the field of artificial neural networks with realistic looking visual outputs. It aims at morphing face pictures of a specific identity to look like another individual by only modifying key features, such as eye color, while leaving identity-independent features unchanged. Prior works have covered the topic of symmetric translation between two specific domains but failed to optimize it on faces where only parts of the image may be changed. This work applies a face masking operation to the output at training time, which forces the image generator to preserve colors while altering the face, fitting it naturally inside the unmorphed surroundings. Various experiments are conducted including an ablation study on the final setting, decreasing the baseline identity switching performance from 81.7% to 75.8 % whilst improving the average χ2 color distance from 0.551 to 0.434. The provided code-based software gives users easy access to apply this neural face swap to images and videos of arbitrary crop and brings Computer Vision one step closer to replacing Computer Graphics in this specific area.

A Study of the Importance of Parts in the Deformable Parts Model

Sammer puran · june 2017, self-similarity as a meta feature, lucas husi · april 2017, a study of 3d deformable parts models for detection and pose-estimation, simon jenni · march 2015, accelerated federated learning on client silos with label noise: rho selection in classification and segmentation, irakli kelbakiani · may 2024.

Federated Learning has recently gained more research interest. This increased attention is caused by factors including the growth of decentralized data, privacy concerns, and new privacy regulations. In Federated Learning, remote servers keep training a model on local datasets independently, and subsequently, local models are aggregated into a global model, which achieves better overall performance. Sending local model weights instead of the entire dataset is a significant advantage of Federated Learning over centralized classical machine learning algorithms. Federated learning involves uploading and downloading model parameters multiple times, so there are multiple communication rounds between the global server and remote client servers, which imposes challenges. The high number of necessary communication rounds not only increases high-cost communication overheads but is also a critical limitation for servers with low network bandwidth, which leads to latency and a higher probability of training failures caused by communication breakdowns. To mitigate these challenges, we aim to provide a fast-convergent Federated Learning training methodology that decreases the number of necessary communication rounds. We found a paper about Reducible Holdout Loss Selection (RHO-Loss) batch selection methodology, which ”selects low-noise, task-relevant, non-redundant points for training” [1]; we hypothesize, if client silos employ RHO-Loss methodology and successfully avoid training their local models on noisy and non-relevant samples, clients may offer stable and consistent updates to the global server, which could lead to faster convergence of the global model. Our contribution focuses on investigating the RHO-Loss method in a simulated federated setting for the Clothing1M dataset. We also examine its applicability to medical datasets and check its effectiveness in a simulated federated environment. Our experimental results show a promising outcome, specifically a reduction in communication rounds for the Clothing1M dataset. However, as the success of the RHO-Loss selection method depends on the availability of sufficient training data for the target RHO model and for the Irreducible RHO model, we emphasize that our contribution applies to those Federated Learning scenarios where client silos hold enough training data to successfully train and benefit from their RHO model on their local dataset.

Amodal Leaf Segmentation

Nicolas maier · nov. 2023.

Plant phenotyping is the process of measuring and analyzing various traits of plants. It provides essential information on how genetic and environmental factors affect plant growth and development. Manual phenotyping is highly time-consuming; therefore, many computer vision and machine learning based methods have been proposed in the past years to perform this task automatically based on images of the plants. However, the publicly available datasets (in particular, of Arabidopsis thaliana) are limited in size and diversity, making them unsuitable to generalize to new unseen environments. In this work, we propose a complete pipeline able to automatically extract traits of interest from an image of Arabidopsis thaliana. Our method uses a minimal amount of existing annotated data from a source domain to generate a large synthetic dataset adapted to a different target domain (e.g., different backgrounds, lighting conditions, and plant layouts). In addition, unlike the source dataset, the synthetic one provides ground-truth annotations for the occluded parts of the leaves, which are relevant when measuring some characteristics of the plant, e.g., its total area. This synthetic dataset is then used to train a model to perform amodal instance segmentation of the leaves to obtain the total area, leaf count, and color of each plant. To validate our approach, we create a small dataset composed of manually annotated real images of Arabidopsis thaliana, which is used to assess the performance of the models.

Assessment of movement and pose in a hospital bed by ambient and wearable sensor technology in healthy subjects

Tony licata · sept. 2022.

The use of automated systems describing the human motion has become possible in various domains. Most of the proposed systems are designed to work with people moving around in a standing position. Because such system could be interesting in a medical environment, we propose in this work a pipeline that can effectively predict human motion from people lying on beds. The proposed pipeline is tested with a data set composed of 41 participants executing 7 predefined tasks in a bed. The motion of the participants is measured with video cameras, accelerometers and pressure mat. Various experiments are carried with the information retrieved from the data set. Two approaches combining the data from the different measure technologies are explored. The performance of the different carried experiments is measured, and the proposed pipeline is composed with components providing the best results. Later on, we show that the proposed pipeline only needs to use the video cameras, which make the proposed environment easier to implement in real life situations.

Machine Learning Based Prediction of Mental Health Using Wearable-measured Time Series

Seyedeh sharareh mirzargar · sept. 2022.

Depression is the second major cause for years spent in disability and has a growing prevalence in adolescents. The recent Covid-19 pandemic has intensified the situation and limited in-person patient monitoring due to distancing measures. Recent advances in wearable devices have made it possible to record the rest/activity cycle remotely with high precision and in real-world contexts. We aim to use machine learning methods to predict an individual's mental health based on wearable-measured sleep and physical activity. Predicting an impending mental health crisis of an adolescent allows for prompt intervention, detection of depression onset or its recursion, and remote monitoring. To achieve this goal, we train three primary forecasting models; linear regression, random forest, and light gradient boosted machine (LightGBM); and two deep learning models; block recurrent neural network (block RNN) and temporal convolutional network (TCN); on Actigraph measurements to forecast mental health in terms of depression, anxiety, sleepiness, stress, sleep quality, and behavioral problems. Our models achieve a high forecasting performance, the random forest being the winner to reach an accuracy of 98% for forecasting the trait anxiety. We perform extensive experiments to evaluate the models' performance in accuracy, generalization, and feature utilization, using a naive forecaster as the baseline. Our analysis shows minimal mental health changes over two months, making the prediction task easily achievable. Due to these minimal changes in mental health, the models tend to primarily use the historical values of mental health evaluation instead of Actigraph features. At the time of this master thesis, the data acquisition step is still in progress. In future work, we plan to train the models on the complete dataset using a longer forecasting horizon to increase the level of mental health changes and perform transfer learning to compensate for the small dataset size. This interdisciplinary project demonstrates the opportunities and challenges in machine learning based prediction of mental health, paving the way toward using the same techniques to forecast other mental disorders such as internalizing disorder, Parkinson's disease, Alzheimer's disease, etc. and improving the quality of life for individuals who have some mental disorder.

CNN Spike Detector: Detection of Spikes in Intracranial EEG using Convolutional Neural Networks

Stefan jonas · oct. 2021.

The detection of interictal epileptiform discharges in the visual analysis of electroencephalography (EEG) is an important but very difficult, tedious, and time-consuming task. There have been decades of research on computer-assisted detection algorithms, most recently focused on using Convolutional Neural Networks (CNNs). In this thesis, we present the CNN Spike Detector, a convolutional neural network to detect spikes in intracranial EEG. Our dataset of 70 intracranial EEG recordings from 26 subjects with epilepsy introduces new challenges in this research field. We report cross-validation results with a mean AUC of 0.926 (+- 0.04), an area under the precision-recall curve (AUPRC) of 0.652 (+- 0.10) and 12.3 (+- 7.47) false positive epochs per minute for a sensitivity of 80%. A visual examination of false positive segments is performed to understand the model behavior leading to a relatively high false detection rate. We notice issues with the evaluation measures and highlight a major limitation of the common approach of detecting spikes using short segments, namely that the network is not capable to consider the greater context of the segment with regards to its origination. For this reason, we present the Context Model, an extension in which the CNN Spike Detector is supplied with additional information about the channel. Results show promising but limited performance improvements. This thesis provides important findings about the spike detection task for intracranial EEG and lays out promising future research directions to develop a network capable of assisting experts in real-world clinical applications.

PolitBERT - Deepfake Detection of American Politicians using Natural Language Processing

Maurice rupp · april 2021.

This thesis explores the application of modern Natural Language Processing techniques to the detection of artificially generated videos of popular American politicians. Instead of focusing on detecting anomalies and artifacts in images and sounds, this thesis focuses on detecting irregularities and inconsistencies in the words themselves, opening up a new possibility to detect fake content. A novel, domain-adapted, pre-trained version of the language model BERT combined with several mechanisms to overcome severe dataset imbalances yielded the best quantitative as well as qualitative results. Additionally to the creation of the biggest publicly available dataset of English-speaking politicians consisting of 1.5 M sentences from over 1000 persons, this thesis conducts various experiments with different kinds of text classification and sequence processing algorithms applied to the political domain. Furthermore, multiple ablations to manage severe data imbalance are presented and evaluated.

A Study on the Inversion of Generative Adversarial Networks

Ramona beck · march 2021.

The desire to use generative adversarial networks (GANs) for real-world tasks such as object segmentation or image manipulation is increasing as synthesis quality improves, which has given rise to an emerging research area called GAN inversion that focuses on exploring methods for embedding real images into the latent space of a GAN. In this work, we investigate different GAN inversion approaches using an existing generative model architecture that takes a completely unsupervised approach to object segmentation and is based on StyleGAN2. In particular, we propose and analyze algorithms for embedding real images into the different latent spaces Z, W, and W+ of StyleGAN following an optimization-based inversion approach, while also investigating a novel approach that allows fine-tuning of the generator during the inversion process. Furthermore, we investigate a hybrid and a learning-based inversion approach, where in the former we train an encoder with embeddings optimized by our best optimization-based inversion approach, and in the latter we define an autoencoder, consisting of an encoder and the generator of our generative model as a decoder, and train it to map an image into the latent space. We demonstrate the effectiveness of our methods as well as their limitations through a quantitative comparison with existing inversion methods and by conducting extensive qualitative and quantitative experiments with synthetic data as well as real images from a complex image dataset. We show that we achieve qualitatively satisfying embeddings in the W and W+ spaces with our optimization-based algorithms, that fine-tuning the generator during the inversion process leads to qualitatively better embeddings in all latent spaces studied, and that the learning-based approach also benefits from a variable generator as well as a pre-training with our hybrid approach. Furthermore, we evaluate our approaches on the object segmentation task and show that both our optimization-based and our hybrid and learning-based methods are able to generate meaningful embeddings that achieve reasonable object segmentations. Overall, our proposed methods illustrate the potential that lies in the GAN inversion and its application to real-world tasks, especially in the relaxed version of the GAN inversion where the weights of the generator are allowed to vary.

Multi-scale Momentum Contrast for Self-supervised Image Classification

Zhao xueqi · dec. 2020.

With the maturity of supervised learning technology, people gradually shift the research focus to the field of self-supervised learning. ”Momentum Contrast” (MoCo) proposes a new self-supervised learning method and raises the correct rate of self-supervised learning to a new level. Inspired by another article ”Representation Learning by Learning to Count”, if a picture is divided into four parts and passed through a neural network, it is possible to further improve the accuracy of MoCo. Different from the original MoCo, this MoCo variant (Multi-scale MoCo) does not directly pass the image through the encoder after the augmented images. Multi-scale MoCo crops and resizes the augmented images, and the obtained four parts are respectively passed through the encoder and then summed (upsampled version do not do resize to input but resize the contrastive samples). This method of images crop is not only used for queue q but also used for comparison queue k, otherwise the weights of queue k might be damaged during the moment update. This will further discussed in the experiments chapter between downsampled Multi-scale version and downsampled both Multi-scale version. Human beings also have the same principle of object recognition: when human beings see something they are familiar with, even if the object is not fully displayed, people can still guess the object itself with a high probability. Because of this, Multi-scale MoCo applies this concept to the pretext part of MoCo, hoping to obtain better feature extraction. In this thesis, there are three versions of Multi-scale MoCo, downsampled input samples version, downsampled input samples and contrast samples version and upsampled input samples version. The differences between these versions will be described in more detail later. The neural network architecture comparison includes ResNet50 , and the tested data set is STL-10. The weights obtained in pretext will be transferred to self-supervised learning, and in the process of self-supervised learning, the weights of other layers except the final linear layer are frozen without changing (these weights come from pretext).

Self-Supervised Learning Using Siamese Networks and Binary Classifier

Dušan mihajlov · march 2020.

In this thesis, we present several approaches for training a convolutional neural network using only unlabeled data. Our autonomously supervised learning algorithms are based on connections between image patch i. e. zoomed image and its original. Using the siamese architecture neural network we aim to recognize, if the image patch, which is input to the first neural network part, comes from the same image presented to the second neural network part. By applying transformations to both images, and different zoom sizes at different positions, we force the network to extract high level features using its convolutional layers. At the top of our siamese architecture, we have a simple binary classifier that measures the difference between feature maps that we extract and makes a decision. Thus, the only way that the classifier will solve the task correctly is when our convolutional layers are extracting useful representations. Those representations we can than use to solve many different tasks that are related to the data used for unsupervised training. As the main benchmark for all of our models, we used STL10 dataset, where we train a linear classifier on the top of our convolutional layers with a small amount of manually labeled images, which is a widely used benchmark for unsupervised learning tasks. We also combine our idea with recent work on the same topic, and the network called RotNet, which makes use of image rotations and therefore forces the network to learn rotation dependent features from the dataset. As a result of this combination we create a new procedure that outperforms original RotNet.

Learning Object Representations by Mixing Scenes

Lukas zbinden · may 2019.

In the digital age of ever increasing data amassment and accessibility, the demand for scalable machine learning models effective at refining the new oil is unprecedented. Unsupervised representation learning methods present a promising approach to exploit this invaluable yet unlabeled digital resource at scale. However, a majority of these approaches focuses on synthetic or simplified datasets of images. What if a method could learn directly from natural Internet-scale image data? In this thesis, we propose a novel approach for unsupervised learning of object representations by mixing natural image scenes. Without any human help, our method mixes visually similar images to synthesize new realistic scenes using adversarial training. In this process the model learns to represent and understand the objects prevalent in natural image data and makes them available for downstream applications. For example, it enables the transfer of objects from one scene to another. Through qualitative experiments on complex image data we show the effectiveness of our method along with its limitations. Moreover, we benchmark our approach quantitatively against state-of-the-art works on the STL-10 dataset. Our proposed method demonstrates the potential that lies in learning representations directly from natural image data and reinforces it as a promising avenue for future research.

Representation Learning using Semantic Distances

Markus roth · may 2019, zero-shot learning using generative adversarial networks, hamed hemati · dec. 2018, dimensionality reduction via cnns - learning the distance between images, ioannis glampedakis · sept. 2018, learning to play othello using deep reinforcement learning and self play, thomas simon steinmann · sept. 2018, aba-j interactive multi-modality tissue sectionto-volume alignment: a brain atlasing toolkit for imagej, felix meyenhofer · march 2018, learning visual odometry with recurrent neural networks, adrian wälchli · feb. 2018.

In computer vision, Visual Odometry is the problem of recovering the camera motion from a video. It is related to Structure from Motion, the problem of reconstructing the 3D geometry from a collection of images. Decades of research in these areas have brought successful algorithms that are used in applications like autonomous navigation, motion capture, augmented reality and others. Despite the success of these prior works in real-world environments, their robustness is highly dependent on manual calibration and the magnitude of noise present in the images in form of, e.g., non-Lambertian surfaces, dynamic motion and other forms of ambiguity. This thesis explores an alternative approach to the Visual Odometry problem via Deep Learning, that is, a specific form of machine learning with artificial neural networks. It describes and focuses on the implementation of a recent work that proposes the use of Recurrent Neural Networks to learn dependencies over time due to the sequential nature of the input. Together with a convolutional neural network that extracts motion features from the input stream, the recurrent part accumulates knowledge from the past to make camera pose estimations at each point in time. An analysis on the performance of this system is carried out on real and synthetic data. The evaluation covers several ways of training the network as well as the impact and limitations of the recurrent connection for Visual Odometry.

Crime location and timing prediction

Bernard swart · jan. 2018, from cartoons to real images: an approach to unsupervised visual representation learning, simon jenni · feb. 2017, automatic and large-scale assessment of fluid in retinal oct volume, nina mujkanovic · dec. 2016, segmentation in 3d using eye-tracking technology, michele wyss · july 2016, accurate scale thresholding via logarithmic total variation prior, remo diethelm · aug. 2014, unsupervised object segmentation with generative models, adam jakub bielski · april 2024.

Advances in computer vision have transformed how we interact with technology, driven by significant breakthroughs in scalable deep learning and the availability of large datasets. These technologies now play a crucial role in various applications, from improving user experience through applications like organizing digital photo libraries, to advancing medical diagnostics and treatments. Despite these valuable applications, the creation of annotated datasets remains a significant bottleneck. It is not only costly and labor-intensive but also prone to inaccuracies and human biases. Moreover, it often requires specialized knowledge or careful handling of sensitive information. Among the tasks in computer vision, image segmentation particularly highlights these challenges, with its need for precise pixel-level annotations. This context underscores the need for unsupervised approaches in computer vision, which can leverage the large volumes of unlabeled images produced every day. This thesis introduces several novel methods for learning fully unsupervised object segmentation models using only collections of images. Unlike much prior work, our approaches are effective on complex real-world images and do not rely on any form of annotations, including pre-trained supervised networks, bounding boxes, or class labels. We identify and leverage intrinsic properties of objects – most notably, the cohesive movement of object parts – as powerful signals for driving unsupervised object segmentation. Utilizing innovative generative adversarial models, we employ this principle to either generate segmented objects or directly segment them in a manner that allows for realistic movement within scenes. Our work demonstrates how such generated data can train a segmentation model that effectively generalizes to realworld images. Furthermore, we introduce a method that, in conjunction with recent advances in self-supervised learning, achieves state-of-the-art results in unsupervised object segmentation. Our methods rely on the effectiveness of Generative Adversarial Networks, which are known to be challenging to train and exhibit mode collapse. We propose a new, more principled GAN loss, whose gradients encourage the generator model to explore missing modes in its distribution, addressing these limitations and enhancing the robustness of generative models.

Novel Techniques for Robust and Generalizable Machine Learning

Abdelhak lemkhenter · sept. 2023.

Neural networks have transcended their status of powerful proof-of-concept machine learning into the realm of a highly disruptive technology that has revolutionized many quantitative fields such as drug discovery, autonomous vehicles, and machine translation. Today, it is nearly impossible to go a single day without interacting with a neural network-powered application. From search engines to on-device photo-processing, neural networks have become the go-to solution thanks to recent advances in computational hardware and an unprecedented scale of training data. Larger and less curated datasets, typically obtained through web crawling, have greatly propelled the capabilities of neural networks forward. However, this increase in scale amplifies certain challenges associated with training such models. Beyond toy or carefully curated datasets, data in the wild is plagued with biases, imbalances, and various noisy components. Given the larger size of modern neural networks, such models run the risk of learning spurious correlations that fail to generalize beyond their training data. This thesis addresses the problem of training more robust and generalizable machine learning models across a wide range of learning paradigms for medical time series and computer vision tasks. The former is a typical example of a low signal-to-noise ratio data modality with a high degree of variability between subjects and datasets. There, we tailor the training scheme to focus on robust patterns that generalize to new subjects and ignore the noisier and subject-specific patterns. To achieve this, we first introduce a physiologically inspired unsupervised training task and then extend it by explicitly optimizing for cross-dataset generalization using meta-learning. In the context of image classification, we address the challenge of training semi-supervised models under class imbalance by designing a novel label refinement strategy with higher local sensitivity to minority class samples while preserving the global data distribution. Lastly, we introduce a new Generative Adversarial Networks training loss. Such generative models could be applied to improve the training of subsequent models in the low data regime by augmenting the dataset using generated samples. Unfortunately, GAN training relies on a delicate balance between its components, making it prone mode collapse. Our contribution consists of defining a more principled GAN loss whose gradients incentivize the generator model to seek out missing modes in its distribution. All in all, this thesis tackles the challenge of training more robust machine learning models that can generalize beyond their training data. This necessitates the development of methods specifically tailored to handle the diverse biases and spurious correlations inherent in the data. It is important to note that achieving greater generalizability in models goes beyond simply increasing the volume of data; it requires meticulous consideration of training objectives and model architecture. By tackling these challenges, this research contributes to advancing the field of machine learning and underscores the significance of thoughtful design in obtaining more resilient and versatile models.

Automated Sleep Scoring, Deep Learning and Physician Supervision

Luigi fiorillo · oct. 2022.

Sleep plays a crucial role in human well-being. Polysomnography is used in sleep medicine as a diagnostic tool, so as to objectively analyze the quality of sleep. Sleep scoring is the procedure of extracting sleep cycle information from the wholenight electrophysiological signals. The scoring is done worldwide by the sleep physicians according to the official American Academy of Sleep Medicine (AASM) scoring manual. In the last decades, a wide variety of deep learning based algorithms have been proposed to automatise the sleep scoring task. In this thesis we study the reasons why these algorithms fail to be introduced in the daily clinical routine, with the perspective of bridging the existing gap between the automatic sleep scoring models and the sleep physicians. In this light, the primary step is the design of a simplified sleep scoring architecture, also providing an estimate of the model uncertainty. Beside achieving results on par with most up-to-date scoring systems, we demonstrate the efficiency of ensemble learning based algorithms, together with label smoothing techniques, in both enhancing the performance and calibrating the simplified scoring model. We introduced an uncertainty estimate procedure, so as to identify the most challenging sleep stage predictions, and to quantify the disagreement between the predictions given by the model and the annotation given by the physicians. In this thesis we also propose a novel method to integrate the inter-scorer variability into the training procedure of a sleep scoring model. We clearly show that a deep learning model is able to encode this variability, so as to better adapt to the consensus of a group of scorers-physicians. We finally address the generalization ability of a deep learning based sleep scoring system, further studying its resilience to the sleep complexity and to the AASM scoring rules. We can state that there is no need to train the algorithm strictly following the AASM guidelines. Most importantly, using data from multiple data centers results in a better performing model compared with training on a single data cohort. The variability among different scorers and data centers needs to be taken into account, more than the variability among sleep disorders.

Learning Representations for Controllable Image Restoration

Givi meishvili · march 2022.

Deep Convolutional Neural Networks have sparked a renaissance in all the sub-fields of computer vision. Tremendous progress has been made in the area of image restoration. The research community has pushed the boundaries of image deblurring, super-resolution, and denoising. However, given a distorted image, most existing methods typically produce a single restored output. The tasks mentioned above are inherently ill-posed, leading to an infinite number of plausible solutions. This thesis focuses on designing image restoration techniques capable of producing multiple restored results and granting users more control over the restoration process. Towards this goal, we demonstrate how one could leverage the power of unsupervised representation learning. Image restoration is vital when applied to distorted images of human faces due to their social significance. Generative Adversarial Networks enable an unprecedented level of generated facial details combined with smooth latent space. We leverage the power of GANs towards the goal of learning controllable neural face representations. We demonstrate how to learn an inverse mapping from image space to these latent representations, tuning these representations towards a specific task, and finally manipulating latent codes in these spaces. For example, we show how GANs and their inverse mappings enable the restoration and editing of faces in the context of extreme face super-resolution and the generation of novel view sharp videos from a single motion-blurred image of a face. This thesis also addresses more general blind super-resolution, denoising, and scratch removal problems, where blur kernels and noise levels are unknown. We resort to contrastive representation learning and first learn the latent space of degradations. We demonstrate that the learned representation allows inference of ground-truth degradation parameters and can guide the restoration process. Moreover, it enables control over the amount of deblurring and denoising in the restoration via manipulation of latent degradation features.

Learning Generalizable Visual Patterns Without Human Supervision

Simon jenni · oct. 2021.

Owing to the existence of large labeled datasets, Deep Convolutional Neural Networks have ushered in a renaissance in computer vision. However, almost all of the visual data we generate daily - several human lives worth of it - remains unlabeled and thus out of reach of today’s dominant supervised learning paradigm. This thesis focuses on techniques that steer deep models towards learning generalizable visual patterns without human supervision. Our primary tool in this endeavor is the design of Self-Supervised Learning tasks, i.e., pretext-tasks for which labels do not involve human labor. Besides enabling the learning from large amounts of unlabeled data, we demonstrate how self-supervision can capture relevant patterns that supervised learning largely misses. For example, we design learning tasks that learn deep representations capturing shape from images, motion from video, and 3D pose features from multi-view data. Notably, these tasks’ design follows a common principle: The recognition of data transformations. The strong performance of the learned representations on downstream vision tasks such as classification, segmentation, action recognition, or pose estimation validate this pretext-task design. This thesis also explores the use of Generative Adversarial Networks (GANs) for unsupervised representation learning. Besides leveraging generative adversarial learning to define image transformation for self-supervised learning tasks, we also address training instabilities of GANs through the use of noise. While unsupervised techniques can significantly reduce the burden of supervision, in the end, we still rely on some annotated examples to fine-tune learned representations towards a target task. To improve the learning from scarce or noisy labels, we describe a supervised learning algorithm with improved generalization in these challenging settings.

Learning Interpretable Representations of Images

Attila szabó · june 2019.

Computers represent images with pixels and each pixel contains three numbers for red, green and blue colour values. These numbers are meaningless for humans and they are mostly useless when used directly with classical machine learning techniques like linear classifiers. Interpretable representations are the attributes that humans understand: the colour of the hair, viewpoint of a car or the 3D shape of the object in the scene. Many computer vision tasks can be viewed as learning interpretable representations, for example a supervised classification algorithm directly learns to represent images with their class labels. In this work we aim to learn interpretable representations (or features) indirectly with lower levels of supervision. This approach has the advantage of cost savings on dataset annotations and the flexibility of using the features for multiple follow-up tasks. We made contributions in three main areas: weakly supervised learning, unsupervised learning and 3D reconstruction. In the weakly supervised case we use image pairs as supervision. Each pair shares a common attribute and differs in a varying attribute. We propose a training method that learns to separate the attributes into separate feature vectors. These features then are used for attribute transfer and classification. We also show theoretical results on the ambiguities of the learning task and the ways to avoid degenerate solutions. We show a method for unsupervised representation learning, that separates semantically meaningful concepts. We explain and show ablation studies how the components of our proposed method work: a mixing autoencoder, a generative adversarial net and a classifier. We propose a method for learning single image 3D reconstruction. It is done using only the images, no human annotation, stereo, synthetic renderings or ground truth depth map is needed. We train a generative model that learns the 3D shape distribution and an encoder to reconstruct the 3D shape. For that we exploit the notion of image realism. It means that the 3D reconstruction of the object has to look realistic when it is rendered from different random angles. We prove the efficacy of our method from first principles.

Learning Controllable Representations for Image Synthesis

Qiyang hu · june 2019.

In this thesis, our focus is learning a controllable representation and applying the learned controllable feature representation on images synthesis, video generation, and even 3D reconstruction. We propose different methods to disentangle the feature representation in neural network and analyze the challenges in disentanglement such as reference ambiguity and shortcut problem when using the weak label. We use the disentangled feature representation to transfer attributes between images such as exchanging hairstyle between two face images. Furthermore, we study the problem of how another type of feature, sketch, works in a neural network. The sketch can provide shape and contour of an object such as the silhouette of the side-view face. We leverage the silhouette constraint to improve the 3D face reconstruction from 2D images. The sketch can also provide the moving directions of one object, thus we investigate how one can manipulate the object to follow the trajectory provided by a user sketch. We propose a method to automatically generate video clips from a single image input using the sketch as motion and trajectory guidance to animate the object in that image. We demonstrate the efficiency of our approaches on several synthetic and real datasets.

Beyond Supervised Representation Learning

Mehdi noroozi · jan. 2019.

The complexity of any information processing task is highly dependent on the space where data is represented. Unfortunately, pixel space is not appropriate for the computer vision tasks such as object classification. The traditional computer vision approaches involve a multi-stage pipeline where at first images are transformed to a feature space through a handcrafted function and then consequenced by the solution in the feature space. The challenge with this approach is the complexity of designing handcrafted functions that extract robust features. The deep learning based approaches address this issue by end-to-end training of a neural network for some tasks that lets the network to discover the appropriate representation for the training tasks automatically. It turns out that image classification task on large scale annotated datasets yields a representation transferable to other computer vision tasks. However, supervised representation learning is limited to annotations. In this thesis we study self-supervised representation learning where the goal is to alleviate these limitations by substituting the classification task with pseudo tasks where the labels come for free. We discuss self-supervised learning by solving jigsaw puzzles that uses context as supervisory signal. The rational behind this task is that the network requires to extract features about object parts and their spatial configurations to solve the jigsaw puzzles. We also discuss a method for representation learning that uses an artificial supervisory signal based on counting visual primitives. This supervisory signal is obtained from an equivariance relation. We use two image transformations in the context of counting: scaling and tiling. The first transformation exploits the fact that the number of visual primitives should be invariant to scale. The second transformation allows us to equate the total number of visual primitives in each tile to that in the whole image. The most effective transfer strategy is fine-tuning, which restricts one to use the same model or parts thereof for both pretext and target tasks. We discuss a novel framework for self-supervised learning that overcomes limitations in designing and comparing different tasks, models, and data domains. In particular, our framework decouples the structure of the self-supervised model from the final task-specific finetuned model. Finally, we study the problem of multi-task representation learning. A naive approach to enhance the representation learned by a task is to train the task jointly with other tasks that capture orthogonal attributes. Having a diverse set of auxiliary tasks, imposes challenges on multi-task training from scratch. We propose a framework that allows us to combine arbitrarily different feature spaces into a single deep neural network. We reduce the auxiliary tasks to classification tasks and the multi-task learning to multi-label classification task consequently. Nevertheless, combining multiple representation space without being aware of the target task might be suboptimal. As our second contribution, we show empirically that this is indeed the case and propose to combine multiple tasks after the fine-tuning on the target task.

Motion Deblurring from a Single Image

Meiguang jin · dec. 2018.

With the information explosion, a tremendous amount photos is captured and shared via social media everyday. Technically, a photo requires a finite exposure to accumulate light from the scene. Thus, objects moving during the exposure generate motion blur in a photo. Motion blur is an image degradation that makes visual content less interpretable and is therefore often seen as a nuisance. Although motion blur can be reduced by setting a short exposure time, an insufficient amount of light has to be compensated through increasing the sensor’s sensitivity, which will inevitably bring large amount of sensor noise. Thus this motivates the necessity of removing motion blur computationally. Motion deblurring is an important problem in computer vision and it is challenging due to its ill-posed nature, which means the solution is not well defined. Mathematically, a blurry image caused by uniform motion is formed by the convolution operation between a blur kernel and a latent sharp image. Potentially there are infinite pairs of blur kernel and latent sharp image that can result in the same blurry image. Hence, some prior knowledge or regularization is required to address this problem. Even if the blur kernel is known, restoring the latent sharp image is still difficult as the high frequency information has been removed. Although we can model the uniform motion deblurring problem mathematically, it can only address the camera in-plane translational motion. Practically, motion is more complicated and can be non-uniform. Non-uniform motion blur can come from many sources, camera out-of-plane rotation, scene depth change, object motion and so on. Thus, it is more challenging to remove non-uniform motion blur. In this thesis, our focus is motion blur removal. We aim to address four challenging motion deblurring problems. We start from the noise blind image deblurring scenario where blur kernel is known but the noise level is unknown. We introduce an efficient and robust solution based on a Bayesian framework using a smooth generalization of the 0−1 loss to address this problem. Then we study the blind uniform motion deblurring scenario where both the blur kernel and the latent sharp image are unknown. We explore the relative scale ambiguity between the latent sharp image and blur kernel to address this issue. Moreover, we study the face deblurring problem and introduce a novel deep learning network architecture to solve it. We also address the general motion deblurring problem and particularly we aim at recovering a sequence of 7 frames each depicting some instantaneous motion of the objects in the scene.

Towards a Novel Paradigm in Blind Deconvolution: From Natural to Cartooned Image Statistics

Daniele perrone · july 2015.

In this thesis we study the blind deconvolution problem. Blind deconvolution consists in the estimation of a sharp image and a blur kernel from an observed blurry image. Because the blur model admits several solutions it is necessary to devise an image prior that favors the true blur kernel and sharp image. Recently it has been shown that a class of blind deconvolution formulations and image priors has the no-blur solution as global minimum. Despite this shortcoming, algorithms based on these formulations and priors can successfully solve blind deconvolution. In this thesis we show that a suitable initialization can exploit the non-convexity of the problem and yield the desired solution. Based on these conclusions, we propose a novel “vanilla” algorithm stripped of any enhancement typically used in the literature. Our algorithm, despite its simplicity, is able to compete with the top performers on several datasets. We have also investigated a remarkable behavior of a 1998 algorithm, whose formulation has the no-blur solution as global minimum: even when initialized at the no-blur solution, it converges to the correct solution. We show that this behavior is caused by an apparently insignificant implementation strategy that makes the algorithm no longer minimize the original cost functional. We also demonstrate that this strategy improves the results of our “vanilla” algorithm. Finally, we present a study of image priors for blind deconvolution. We provide experimental evidence supporting the recent belief that a good image prior is one that leads to a good blur estimate rather than being a good natural image statistical model. By focusing the attention on the blur estimation alone, we show that good blur estimates can be obtained even when using images quite different from the true sharp image. This allows using image priors, such as those leading to “cartooned” images, that avoid the no-blur solution. By using an image prior that produces “cartooned” images we achieve state-of-the-art results on different publicly available datasets. We therefore suggests a shift of paradigm in blind deconvolution: from modeling natural image statistics to modeling cartooned image statistics.

New Perspectives on Uncalibrated Photometric Stereo

Thoma papadhimitri · june 2014.

This thesis investigates the problem of 3D reconstruction of a scene from 2D images. In particular, we focus on photometric stereo which is a technique that computes the 3D geometry from at least three images taken from the same viewpoint and under different illumination conditions. When the illumination is unknown (uncalibrated photometric stereo) the problem is ambiguous: different combinations of geometry and illumination can generate the same images. First, we solve the ambiguity by exploiting the Lambertian reflectance maxima. These are points defined on curved surfaces where the normals are parallel to the light direction. Then, we propose a solution that can be computed in closed-form and thus very efficiently. Our algorithm is also very robust and yields always the same estimate regardless of the initial ambiguity. We validate our method on real world experiments and achieve state-of-art results. In this thesis we also solve for the first time the uncalibrated photometric stereo problem under the perspective projection model. We show that unlike in the orthographic case, one can uniquely reconstruct the normals of the object and the lights given only the input images and the camera calibration (focal length and image center). We also propose a very efficient algorithm which we validate on synthetic and real world experiments and show that the proposed technique is a generalization of the orthographic case. Finally, we investigate the uncalibrated photometric stereo problem in the case where the lights are distributed near the scene. In this case we propose an alternating minimization technique which converges quickly and overcomes the limitations of prior work that assumes distant illumination. We show experimentally that adopting a near-light model for real world scenes yields very accurate reconstructions.

METHODS article

Deep learning for understanding satellite imagery: an experimental survey.

Sharada Prasanna Mohanty

  • 1 Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
  • 2 neptune.ml, Warsaw, Poland
  • 3 Warsaw University of Technology, Warsaw, Poland
  • 4 deepsense.ai, Warsaw, Poland
  • 5 Centre for Natural Resources Management, Analysis, Training and Policy Research (NARMA), Kathmandu, Nepal
  • 6 Zurich University of Applied Sciences, Zürich, Switzerland
  • 7 Anthropocene Labs, New York, NY, United States
  • 8 Center for Cognitive Interaction Technology (CITEC), Bielefeld University, Bielefeld, Germany
  • 9 International Institute of Information Technology Hyderabad, Hyderabad, India

Translating satellite imagery into maps requires intensive effort and time, especially leading to inaccurate maps of the affected regions during disaster and conflict. The combination of availability of recent datasets and advances in computer vision made through deep learning paved the way toward automated satellite image translation. To facilitate research in this direction, we introduce the Satellite Imagery Competition using a modified SpaceNet dataset. Participants had to come up with different segmentation models to detect positions of buildings on satellite images. In this work, we present five approaches based on improvements of U-Net and Mask R-Convolutional Neuronal Networks models, coupled with unique training adaptations using boosting algorithms, morphological filter, Conditional Random Fields and custom losses. The good results—as high as A P = 0.937 and A R = 0.959 —from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation.

1. Introduction

Despite substantial advances in global human well-being, the world continues to experience humanitarian crizes and natural disasters. Long-term and reignited conflicts affect people in many parts of the world, but often, accurate maps of the affected regions either do not exist or are outdated by disaster or conflict. Satellite imagery is readily available to humanitarian organizations, but translating images into maps is an intensive effort. Today, maps are produced by specialized organizations or in volunteer events such as mapathons, where imagery is annotated with roads, buildings, farms, rivers etc. In this work, we explore how machine learning can help pave the way for automated analysis of satellite imagery to generate relevant and real-time maps.

Applications of the state-of-the-art results in deep learning have been increasingly accessible to various different domains over the last few years ( LeCun et al., 2015 ), the main reasons being the advent of end-to-end approaches in deep learning ( LeCun et al., 2015 ), and the access to vast amounts of openly available data and high performance compute. The same does however not hold true for the research community interested in satellite imagery and remote sensing. While access to high-performance compute infrastructure has not been an inhibiting factor, access to high-resolution imagery still stays a major inhibiting factor to high quality AI/ML research in satellite imagery and remote sensing.

This work builds on top of a recently released open dataset, SpaceNet ( v 1) ( Spacenet on aws, 2018 ), which in partnership with Digital Globe, released raw multiband satellite imagery of (as high as) 30 cm resolution for numerous cities like Vegas, Paris, Shanghai, Khartoum, along with the corresponding annotations of buildings and roads. In this work, we focus on the problem of instance segmentation on a simplified version of the SpaceNet dataset, in order to detect buildings in different urban settings on high resolution satellite imagery. A large-scale competition was organized by the challenge platform crowdAI, which released a simplified version (details in Section 3) of the SpaceNet dataset, and attracted 55 participants and 719 submissions. In general, different architectures for image segmentation have been proposed in the past. Mask R-Convolutional Neuronal Networks (CNN) and U-Net type of architectures are currently seen as state-of-the-art for such problems. This has been further substantiated by the success of such architectures in this competition and as well for the application in satellite imagery. The top contestants all fall into these two basic categories and both show that they compete on a similar high level. Five different adaptations of U-Net and Mask-RCNN based approaches were applied in context of this problem and showed top performance in the segmentation challenge. The different improvements and results for these five approaches are outlined in this paper. The next section will review related work with a particular focus on the development of U-Net and Mask-RCNN types of architectures. This will be followed by a brief description of the used dataset and the applied evaluation metrics. Afterwards, the different methods will be explained and presented together with accompanying results and we will analyze the effect of the depth of the U-Net structure on results. A brief section will provide a comparison of the approaches followed by the conclusion.

2. Related Work

Semantic segmentation deals with the task of assigning each pixel in a given image to one of potentially multiple classes. It deals with recognizing which objects are shown and where exactly these are presented in the image. As such, it is a challenging task that requires, on the one hand, to take into account the overall context of the image and for each pixel that of the surrounding area. On the other hand, it is required to label each pixel individually focusing on a very fine level of detail. While approaches to semantic segmentation have been around for a long time (see review on more traditional approaches in Thoma (2016) , or for example, He et al. (2004) , Shotton et al. (2009) ), the recent success of Deep Neural Networks in image related tasks ( Krizhevsky et al., 2012 ) has translated as well to the area of semantic segmentation. Deep Neural Networks and in particular Convolutional Neuronal Networks have revolutionized the area of image classification during the last decade and are now the dominant approach for image classification leading to deeper and deeper architectures ( He et al., 2016 ). This became possible through algorithmic advances—as using rectified-linear units that avoid vanishing of the gradient during training ( Krizhevsky et al., 2012 )—, as well as implementing convolutional and pooling layers that had originally been proposed long before ( Fukushima, 1980 ). Such approaches deal with the question of what is shown in a given image. Using convolutional filters—that only focus on small portions of the image and are moved over the whole image—allows to learn subsequently more and more abstract structures and invariances in images. Learning becomes efficient through weight sharing and the whole network can be trained in an end-to-end fashion. Together with pooling layers, the focus and receptive field of each deeper layer successively broadens until a very coarse latent space summarizes input from large portions of the image and can be used for classification. This step-by-step abstraction helps to resolve invariances as translations of objects and supports classification. A drawback, however, is that this abstraction looses resolution and fine details of structure as needed in semantic segmentation.

While classification addresses what is shown in an image, semantic segmentation in addition deals with where exactly something is shown in the image. None-the-less, the introduction of Deep Learning techniques into semantic segmentation improved dramatically segmentation accuracy and therefore became the predominant approach in this area as well. This further promoted the area and the increasingly better results lead to broad application of approaches in commercial products.

In the following, we will review prominent developments on semantic segmentation using Deep Neural Networks. In particular, the focus will be on U-Net like approaches employing forms of convolutions together with deconvolution or upsampling as well as Mask R-CNN because in the described challenge these kinds of approaches showed to be the best performing ones. For a broader overview: There are different surveys and reviews on the current state of semantic segmentation. Lateef and Ruichek (2019) provide a systematic and exhaustive review of different categories of approaches employing Deep Learning techniques and presenting available benchmarks and datasets as well as evaluation criteria. Hao et al. (2020) put a different focus on the degree of supervision during training. More traditional approaches are summarized in Thoma (2016) and recent advances are briefly addressed in Atif et al. (2019) and Minaee et al. (2020) .

A crucial first architecture was given by the Fully Convolution Network (FCN) ( Long et al., 2015 ) that can be applied to images of any dimension. In general, it is using a convolutional network architecture for the first layers: blocks of convolution and max pooling layers are applied in sequence until the image is downsized to 1 / 32 th of the original dimensions. While in classification afterward fully-connected layers would be utilized on this latent space, in FCNs class predictions are made on this level of detail for the different small clusters. Afterwards, the assigned labels are scaled up to its original size using a sequence of up sampling and deconvolutional layers. While the down stream is collecting contextual information in larger areas and for a coarse resolution, the up stream is tasked with reconstructing more detailed spatial information. This architecture lead to nice improvements on the PASCAL VOC dataset in 2012 and has in particular the advantage that it can be trained in an end-to-end fashion without requiring selection of features or tuning of these. Architectures of such a type employing convolutional layers are today the standard approach for semantic segmentation tasks ( Lateef and Ruichek, 2019 ).

One disadvantage, that was found in the early approaches using DNNs for semantic segmentation, is that detailed structure tends to get lost and fine structures in images appear washed-out. FCN addressed this, on the one hand, by not using a very deep architecture, which would otherwise lead to overly large receptive fields, and, on the other hand, for the last steps they already introduced skip connections. Skip connections provide information from earlier layers in the processing sequence that operate on a more fine grained resolution. The outputs of these previous layers are used as an additional input to the later stage that is not only getting information from the directly preceding layer, but as well as the rerouted information from an earlier layer through the skip connections. The weights of these connections are adapted during training as well. This idea of skip connections has been further refined in U-Net type architectures and has in general be found to be quite effective (e.g., see Chen et al., 2018 ) which used short cut connections that enhanced the results). U-Net ( Ronneberger et al., 2015 ) is an improvement of FCN and constituted of a symmetric arrangement of a contractive and an expansive path. Following a general trend toward smaller convolutional filters, the contractive path consists of a sequence of two 3 × 3 convolutions that is followed by a two-by-two max pooling layer. The expansive path is symmetric, but up-convolutions replace the max pooling layers. Importantly, corresponding layers of both paths are connected by skip connections (see Figure 5 ). These provide detailed information for the upscaling layer that has the same resolution as required for the output of that layer in the expansive path. U-Net provides a simple architecture that has become very popular as it can be implemented quite efficiently and the introduction of local information on every level of detail lead to much improved results. Many further architectures were built following a general U-Net or encoder-decoder like structure. For example, exchanging the different blocks of processing (convolutions and pooling layers) with refined and further improved blocks. Drozdzal et al. (2016) introduced residual blocks that added further skip connections inside each block as residual connections. This, in general, allows for deeper networks and better training which showed in their results as well. As a further step, Jégou et al. (2017) applied two dense blocks in each of the streams that both consist of multiple stacked layers—of convolutions—that are connected by residual connections and the information from all the layers inside that block is aggregated through skip-like connections at the output of a block. As a result, the output of each block contains low level as well as high level features at different resolution. This provided further state-of-the-art results.

In general, downsampling in the down stream—the concatenation of convolutions and pooling operation—aims at increasing the receptive field and taking more context into account as required for classification. But this increase comes with a reduced spatial resolution. As an alternative, dilated convolutions ( Yu and Koltun, 2015 ) as well increase the receptive field without reducing spatial resolution. In dilated (or atrous) convolutions, a convolutional filter is build, but in this case the subsequent entries of the filter are not applied to subsequent entries in the input, but only every l -th entry of the input is processed with l being the dilation factor that represents space between entries in a filter. This increases the size of the receptive field dramatically (over multiple layers it increases exponentially) and still can be implemented reasonably efficient as sparse convolutions. As a result, such layers allow to derive contextual information at multiple scales without losing resolution ( Yu and Koltun, 2015 ). Already the initial approach showed state-of-the-art performance. As one disadvantage, dilated convolutions tend to produce gridding artifacts that stem from the systematic structure of the constructed filters. Further improvements used spatial pyramid pooling modules, as for example in DeepLab ( Chen et al., 2018a ) or DeepLabv3+ in which this is complemented by a simple decoder module ( Chen et al., 2018b ). As an alternative, recently, FastFCN was developed ( Wu et al., 2019 ). As processing dilated convolutions requires quite some memory and time, this approach started from FCN including recent improvements, but ultimately replaced dilated convolutions in the expansive path by Joint Pyramid Upsampling. This showed to be more efficient and still produced good results.

While one advantage of the earlier proposed deep architectures was the possibility to train these in an end-to-end fashion, other approaches used additional pre- and post-processing. DeepLab ( Chen et al., 2018a ) applied Conditional Random Fields (CRF) ( Krähenbühl and Koltun, 2012 ) in a post-processing step which in their case produced better outlines of objects in semantic segmentation. CRFs had been applied in the past as a post-processing step that takes contextual information nicely into account and leads to more coherent labels. CRFs have been tested in one approach in the here described challenge, but it was found that such an explicit step can become unnecessary and appeared not helpful when sufficient detail was already trained into the Deep Neural Network (which is in agreement with other findings). Others have successfully integrated CRFs ( Zheng et al., 2015 ) into training of the whole system as these tend to produce quite good results close to object boundaries.

Regional proposal based methods follow a different type of approach that has shown success in the past as well as in the results presented in this paper. Faster R-CNN ( Ren et al., 2015 ) and Mask R-CNN ( He et al., 2017 ) are examples of this type of architecture. These kind of approaches consist of multiple stages. First, regions of the input image are identified and bounding boxes for possible objects are proposed. In Faster R-CNN ( Ren et al., 2015 ) a region proposal network was introduced as a form of a fully convolutional network. Secondly, features are extracted for each of these bounding boxes. In Faster R-CNN this was realized quite efficiently as both stages can share features that are detected using convolutional layers. In Mask R-CNN a third step is applied, in which—in the same way as in a FCN—the extracted features are not used for classification for the object in that bounding box, but instead are used for detailed pixel-wise prediction of class labels.

For a more detailed overview see Lateef and Ruichek (2019) and Minaee et al. (2020) .

The dataset used in this work was derived from the SpaceNet dataset ( Spacenet on aws, 2018 ). It provides a good dataset for comparing learning approaches on remote sensing data (for a comparable dataset see Castillo-Navarro et al. (2019) ). Instead of considering all the channels in the multiband imagery from the SpaceNet dataset, we only focus on the RGB channels (for an example of an approach exploiting as well spectral information see Ben Hamida et al. (2017) ). The decision to exclude information from non-RGB channels helps create an alternate version of the SpaceNet dataset, which makes the problem easy and accessible to researchers in Deep Learning, who may or may not be very familiar with the tools used by the Remote Sensing community to manipulate the multiband imagery, and are usually more familiar with simple RGB images which are extensively utilized in Deep Learning research. Moreover, when considering only the RGB channels, the problem becomes a direct parallel of very popular instance segmentation tasks commonly studied in Deep Learning research. At the same time, given the flexibility of most of the approaches in Deep Learning, if we demonstrate that we can get good results using just the RGB channels, extending the same approach to a multi channel signal provides us with better results. The dataset consists of a training set of 280,741 images, validation set of 60,317 images and test set of 60,697 images. a (See Fig. 1)

4. Evaluation Metrics

The evaluation was principally based on the Intersection of Union ( I o U ) between the predicted mask and the ground truth.

For a known ground truth mask A , a predicted mask B , we first compute I o U (Intersection Over Union):

I o U measures the overall overlap between the true region and the proposed region.

Then we consider a True detection, when there is at least half an overlap (or I o U ≥ 0.5 ).

We can then define the following parameters (TP - true positive prediction, FP - false positive, FN - false negative):

• Precision ( I o U ≥ 0.5 )

• Recall ( I o U ≥ 0.5 )

The final scoring parameters

(average precision) and

(average recall) are computed by averaging over all the precision and recall values for all known annotations in the ground truth.

www.frontiersin.org

FIGURE 1 . Sample images from the Mapping Challenge Dataset showing the top-down view of satellite imagery.

www.frontiersin.org

FIGURE 2 . Schematic overview of the U-Net pipeline used in the first approach (Section 5). Nodes denote computational steps, arrows denote data flow. Please refer to Table 1 for description of each node.

www.frontiersin.org

FIGURE 3 . Each row represents a randomly sampled tile from the dataset. The 1 st column shows the input RGB image. The 2 nd column shows the ground truth mask. The 3 rd column visualizes the pixel-wise weight computed from the (inverse of) distance of a pixel to the two nearest buildings; high values corresponds to pixels between nearby buildings. The 4 th column visualizes the pixel-wize weight computed from the (inverse of) size of an instance; high values denote small buildings (the smaller the building the darker the color). Note, that the background is fixed to black for both the weight component visualizations.

www.frontiersin.org

TABLE 1 . Experiment results for the U-Net adaptation using Fast Learnings for Fast Mapping (Section 6).

5. Instance Segmentation Using Customized U-Net

As a first approach, we propose a two stage solution. The first stage is a neural network stage based on a U-Net ( Ronneberger et al., 2015 ) followed by a post-processing stage using gradient boosting ( Ke et al., 2017 ). Figure 2 presents the entire U-Net based pipeline, which takes raw RGB images along with some previously calculated meta-data as input, and predicts the instance segmentation masks.

(1) Customized U-Net based Neural Network: Inspired by Iglovikov et al. 2018 , we experimented with U-Net with Resnet34, Resnet101 ( He et al., 2016 ) and Resnet152 as an encoder, and the best results were obtained in the case of Resnet101.

(2) Loss Design: From the initial experiments, we quickly recognized the importance of closely placed buildings, and the tendency of the initial networks to group closely placed buildings as a single instance. The models also struggled with small instances (buildings with a small area in the image), but are equally important for the final evaluation metric (as the metric treats all instances equally when computing the Average Precision and Recall). There were many such cases, where an instance annotation was represented by barely a few pixels, as these were annotations from a building which was overflowing from the adjacent tile, and had only a small part of the building visible (and annotated) in the current tile. In order to be robust to these issues, we have designed a custom loss function by introducing two weighting factors (see Figure 3 for the visualization of weighting factors). The first factor puts a higher weight on pixels that are close to other buildings, and second factor puts a higher weight to pixels that belong to small objects. Equation 1 represents the loss function used in this approach.Let:

• x predictions from U-Net,

• y be ground truth,

• L o s s c e be Cross Entropy Loss,

• L o s s d i c e be Dice Loss,

• W c e be weight assigned to Cross Entropy Loss,

• W d i c e be weight assigned to Dice Loss,

• W d be distance (to the two closest instances) weights,

• W s be size weights.

Then W = W d ⋅ W s is pixel weight and loss function has following definition:

It is the sum of two losses, Cross Entropy Loss and Dice Loss, each weighted by real number picked form ℝ [ 0,1 ] . The Cross Entropy Loss component is additionally weighted with the pixel weight (computed as a dot product of both the distance weight and the size weight) to penalize mis-classifications on pixels belonging to the small objects and closely located instances. This lets us jointly optimize the models ability to distinguish between two closely located buildings, and also the model’s ability to segment out smaller instances. Figure 3 shows a visualization of the individual pixel weight components used in the custom loss function. These custom adaptations to the Loss Functions significantly improved the performance of our model.

(3) Training Scheme: The following multi-stage training scheme along with pre-trained models (as available in PyTorch b ) as starting points is used for better results (for both Average Precision and Average Recall ):

(1) Initialize the model with pre-trained weights,

(2) Train on a 50,000 tile subset from the training set with learning rate = 10 − 4 and dice weight = 0.5,

(3) Train on the full dataset with learning rate = 10 − 4 and dice weight = 0.5,

(4) Train with a (reduced) learning rate = 10 − 5 and dice weight = 0.5,

(5)Train with 10 fold increase in the dice weight (5.0) to make the final predictions smoother.

(4) Pre Processing

• For each pixel: compute distances to the two closest instances are calculated to create the distance map that is used for weighing the loss function.

• Size mask for each image is produced, that encoding the information about object size.

• small masks on the edges of the image were dropped.

(5) Post Processing:

• Test time augmentation: Made predictions on image rotations (90–180–270°) and flips (up-down, left-right) and use the geometric mean of the predictions as the final result.

• Second level model . We finally used Gradient Boosting to train a separate model using Light-GBM on the first stage output for computing the final prediction masks.

Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.938, and a A R I o U ≥ 0.5 of 0.946. Figure 4 , shows examples of some predictions made by the trained model.

www.frontiersin.org

FIGURE 4 . Each row represents a randomly sampled tile from the validation set; the 1 st column shows the input RGB image; the 2 nd column shows the model’ ™ s prediction; and the 3 rd column is constructed as the model’ ™ s prediction superimposed on the input image; finally the 4 th column shows the ground truth superimposed on the input image.

6. Fast Learnings for Fast Mapping Using U-Net

In this section, we approach the problem by reframing it as a binary semantic segmentation task. To summarize, we preprocessed ground truth labels into binary masks, trained with a 1-cycle learning rate schedule on a U-Net ( Ronneberger et al., 2015 ) inspired architecture with an ImageNet-pretrained ResNeXt50 ( Xie et al., 2016 ) encoder, and post-processed the probability maps into polygonized building footprint instances.

We recognized upfront that the main evaluation metric being average precision at 0.5 IoU meant that “good enough” predictions of masks that overlap each true building by greater than 50% was more important than obtaining the highest possible pixel-wise accuracy. We also kept in mind that buildings are at different scales and the smallest or portions of buildings like those cut off at the borders of an image chip would be most challenging to perform well on. These considerations factored into our decision to use a U-Net architecture with a custom loss function that works well at multiple scales and balances pixel-wise with global IoU cost optimization.

In addition to the overall objectives, we also had the goal of refining our model development process to work best with fast, lightweight models and rapid experimentation on small datasets. These criteria are equally important to on-the-field planet-monitoring work where models need to be versatile in production and robust to highly diverse datasets and use cases.

We learned quickly that experiments on the full dataset would take 1 day per experiment due to the large size of the training (260k images) and test set (60k images). Therefore, our experimentation process emphasized doing many rapid and comparable iterations using downsized and small samples of the full dataset to minimize training time per experiment. We used a smaller sample set of the data (6k train, 1.5k val, 1.8k test) and confirmed that training and local evaluation on this sample set correlated well with performance on the full train and test sets. Using this smaller sample reduced training time from 2–3 h/epoch to 3–5 min/epoch. While the 40× speed-up helped significantly, a single experiment could still take 1.5 h or more (training to 30 epochs).

Further reducing time per experiment at the expense of lower accuracy, we downsized images to 64 × 64, 128 × 128, and 256 × 256 and benchmarked performance at each size (i.e., number of epochs to reach a certain loss/metric, best score at end of training, epochs to converge). With downsized, smaller samples and their corresponding benchmarks, we could test new ideas as quickly as 7 min per experiment (64 × 64 images trained for 30 epochs) and consistently compare new experimental results against each other. The most promising experimental settings were then benchmarked on the full training data and evaluated against the held out test set. We also visually inspected predictions against ground truth regularly and made qualitative notes about common failure cases.

For experimentation, we used one remote GPU instance (Nvidia Quadro P6000 with 24 GB of GPU memory), PyTorch 0.3 with the Fast.ai library, and Anaconda Jupyter notebooks to run experiments and document results.

6.1. Pre Processing

Pre Processing training data consisted of reflect-padding images to 320 × 320 which helped increase the visible area of buildings cut off at the sides and corners of each tile. Polygon ground truth labels were converted to binary pixel masks without any other modifications. The data was augmented with random vertical/horizontal flips, 90 ± 4° rotations, and slight image brightness and contrast changes.

Seen in Figure 5 , the model architecture used is U-Net inspired with an ImageNet-pretrained ResNeXt50 encoder (weights from the PyTorch/Fast.ai library). Key characteristics include extracting features at the end of each pretrained ResNeXt block, convolutions within the cross-connections at each feature map size (160, 80, 40, 20, 10) which halves the number of channels, and upsampling using Transpose2D (deconvolution).

www.frontiersin.org

FIGURE 5 . Architecture sketch for the U-Net inspired model using a ResNeXt50 encoder as described in Section 6.

6.2. Training

The loss function is an equally weighted combination of binary cross-entropy loss and soft dice loss which empirically produced better results than either loss alone.

We trained the model with 1-cycle learning rate and cyclical learning rate schedules which was first introduced in Leslie Smith’s research on faster neural network training with very high learning rates ( Leslie and Topin, 2017 ). In recent practical experience, 1-cycle learning was successfully employed by the Fast.ai team in Stanford’™s DAWNBench competition ( Stanford DAWNBench, 2018 ) to achieve the fastest and cheapest methods in training CIFAR-10 and ImageNet classification models to performance benchmarks ( Howard, 2018 ). 1-cycle training functionality is implemented directly in Fast.ai library and its experimental usage is well documented by the team ( Gugger, 2017b ).

With a batch size of 32 and an optimizer of Stochastic Gradient Descent with momentum, we first warmed up the model by training the un-pretrained decoder layers for 1 epoch at a learning rate of 6. Then we unfreezed all weights and started 1-cycle training for 20 epochs as seen in Figure 6 with a learning rate of 0.15, linearly increased it to 6 by 45% through training, linearly decreased it back to 0.15 by 90% through training, and decayed the learning rate to 0.0015 in the last 10% of training. Momentum was scaled inversely to learning rate changes over the same schedule. Figures 7 and 8

www.frontiersin.org

FIGURE 6 . The 1-cycle schedule for learning rate and momentum over training time (as described in Section 6).

www.frontiersin.org

FIGURE 7 . Learning curves from experiments for the U-Net adaptation using Fast Learnings for Fast Mapping (as described in Section 6).

www.frontiersin.org

FIGURE 8 . Example outputs from probability map to polygonized prediction compared with the ground truth (Fast Learnings for Fast Mapping, Section 6).

After 20 epochs with this 1-cycle learning schedule, we trained for two more 10-epoch cycles with a cyclical learning rate schedule (lr = 0.05 to 1 back to 0.05 over each cycle).

Other key model training techniques employed (available out-of-box through Fast.ai library) include gradient clipping to minimize risk of gradient explosion, a learning rate finder utility to select the highest possible learning rate without divergence up front ( Smith, 2017 ; Gugger, 2017a ) and discriminative fine-tuning ( Howard and Ruder, 2018 ) to train earlier layers at smaller learning rates than later layers.

6.3. Post Processing

For inference, we performed 8× test-time augmentation (every possible flip and 90° rotation) and took the geometric mean of all eight outputs to create the probability map. We used a probability threshold to create binary masks, labeled each separated mask as building footprint instances, and converted to polygon submission format. We filtered out very small areas (less than 15 pixel 2 ) during polygonizing to reduce false positives. We calculated the confidence score of each building instance as the average pixel-wise probability value over the area of each instance.

As reported in Table 2 , our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.918, and a A R I o U ≥ 0.5 of 0.929, with a model trained for 40 epochs total: 1-cycle for 20 epochs, 2 × 10 epochs with cyclical learning rate. After just 1-cycle training for 20 epochs, our average precision/recall was already close to best at 0.917 and 0.922. It is worth considering if the extra 20 epochs is worth the performance gain or if a single 1-cycle schedule should be used for 40 epochs instead to achieve even better results.

www.frontiersin.org

TABLE 2 . An overview of experimental results: Shown is performance of the different architectures when using different loss functions during training (for the adapted U-Net architecture, Section 8).

Toward our external objective to develop fast, lightweight models that achieve top-5 performance, our model training time was 1.6 days (1.9 h/epoch × 20 epochs) and an additional 1.6 days for the extra 20 epochs that may not have been necessary. Inference took 0.5 h per test time augmentation run to go through full held out test set.

7. Instance Segmentation Using Deeper U-Nets

0U-Net, as also referenced in Section 5, is an encoder-decoder network for semantic segmentation, which has its origins in medical image segmentation. The model generates a mask for the whole image. This mask then needs to be split into individual sub-masks for the separate buildings.

The model architecture used in this approach, is shown in Figure 9 . The left half of the network (encoder) is similar to a CNN, tasked with coming up with a low dimensional dense representation of the input, and the right side (decoder) then up-samples the learned feature representations to the same shape as the input. The shortcut connections let information flow from the encoder to the decoder and help the network keeping spatial information. As the work of Li et al. (2017) has impressively shown, U-Nets benefit greatly from a deeper model architecture. It allows the model to make much more detailed segmentations. Particularly near the object borders the predictions of a deeper U-Net tend to be more accurate. We used a deep U-Net architecture which was first proposed by Giannakopoulos (2017) .

www.frontiersin.org

FIGURE 9 . Architecture of the Deeper U-Net model (described in Section 7).

The inputs are zero padded to a size of 512 × 512 in order to have an image size where 2 × 2 Max Pooling can be performed multiple times without having to deal with odd image sizes. Symmetric padding would have been another option instead of zero padding. The training images are crops of a larger satellite image. As mentioned in the previous sections, the smallest, and the most difficult objects to predict are the ones which are overflowing objects from an adjacent tile. A symmetric padding would increase the area of those buildings.

For the loss function, dice-loss is combined with binary cross entropy.

The model was trained from scratch for 215 epochs using RMSprop with a learning rate of 0.0001. The training took roughly 5.5 h per epoch on a single Nvidia TITAN Xp GPU. Because of the long training duration, no image augmentation was used. Interestingly the model did not start to overfit, even though no dropout is used. This is likely due to the large training size. The learning curves are displayed in Figure 10 . Notice that the IoU shown in this figure, is calculated on the pixel level and therefore, not the same as IoU in the overall evaluation metric.

www.frontiersin.org

FIGURE 10 . Learning curves showing loss and IoU over training time for the Deeper U-Net model (Section 7).

The model produces a probability estimate for every pixel for being an object of interest (building). A threshold of 0.5 was used, and pixel probabilities greater than 0.5 would classify a pixel as a building. The produced masks were already well separated and were easily transformed to single building masks by giving groups of pixels connected with other groups of pixels a different label. After this step masks with less than 25 pixels were deleted. This removes little artifacts that should not be counted as buildings. A good drop off pixel threshold was empirically computed by looking at the distribution of the areas of the small objects in the dataset. Then we calculate the bounding boxes for every building mask. The building masks and the bounding boxes finally form the final predictions.

Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.930, and a A R I o U ≥ 0.5 of 0.956.

Figure 11 shows a prediction using the Deeper U-Nets, for a sample image from the validation set.

www.frontiersin.org

FIGURE 11 . Comparing constructed masks, drawn by a human (A) and predicted by the Deeper U-Net model (B) .

The generated masks are very accurate and it is expected, that the model would also perform well on other categories such as roads, trees, crops, rivers and lakes. This could make todays Mapathons, where volunteers draw maps from satellite images, completely obsolete. Instead of days or even weeks, maps could be generated in just a few hours from satellite imagery or drone footage. This is crucial for emergency preparedness actors who go to remote areas where no maps exist. Up-to-date maps help them to work efficiently in a crisis situation such as an earthquake.

As the work of Li et al. (2017) has impressively shown, U-Nets benefit greatly from a deeper model architecture. It allows the model to make much more detailed segmentations. Particularly near the object borders the predictions of a deeper U-Net tend to be more accurate.

8. Comparing an Adapted U-Net Architecture for Varying Depths

In this section, we are analyzing and comparing a U-Net like structure ( Ronneberger et al., 2015 ) for different depths. The architecture was derived originally from a convolutional AutoEncoder structure as used for reconstructing images (see, for an example the keras tutorial Chollet (2015) ). This AutoEncoder-type architecture was modified for semantic segmentation: the provided ground truth annotations of the buildings were used as targets for training in order to accomplish the detection of buildings. Furthermore, skip-connections were introduced as found in U-Net that connect encoding and decoding blocks on the same level. These connections help to recover spatial information and in our experiments this provided better reconstruction of details in images compared to post-processing, for example, using conditional random fields ( Krähenbühl and Koltun, 2012 ). This architecture differs from U-Net, first, with respect to the sequence inside the decoding blocks. Following the AutoEncoder approach, the decoder block mimics exactly the encoder block and consists of a single convolution followed by upsampling (max-pooling is used in the encoder block). Second, we used a single convolution of size 5 × 5 . This architecture was used for different depth (stacked blocks of encoders and decoders).

In the first part, we speak about the applied pre- and post-processing of the data-set. Secondly, we introduce our designed network architecture in Section 8.3 and explain our training procedure. In Section 8.5, results are presented that compare variations of our architecture. The results are then summarized and discussed in Section 10.

8.1. Pre-processing

We normalize each input image individually by subtracting its mean and dividing by its standard deviation.

For early tests we resized the images down from 300 × 300 pixels to 128 × 128 pixels, in order to avoid long training times. For our final models, we ultimately used the full resolution so as not to lose any detail.

8.2. Post-processing

For each of the 300 × 300 pixels in an input image, the networks yields a pseudo-probability between 0 and 1, where low values correspond to background (i.e., no building) and high values correspond to foreground (i.e., building).

We binarized these values by setting a threshold θ and assigning 1 if the value is bigger than the threshold θ and 0 otherwise.

To address noise in the background we perform a morphological opening (as provided by OpenCV Bradski, 2000 ) with a radius of 1, that is an erosion (which chooses the minimum value of a neighborhood) followed by a dilation (which chooses the maximum value). This also helps separate closely connected buildings.

In the initial experiments, we tried to apply CRF for post-processing ( Krähenbühl and Koltun, 2012 ), as those have been successfully used for semantic segmentation before. In CRF, the output values are made dependent directly on characteristics in the input image, such as edges. Efficient inference on fully-connected CRF models leads to much finer structure in the output. While in many applications this allows recognition of small details, it did not help us in our use case of detecting buildings in aerial images. In fact, the results were worse when applying CRF. After looking closely at the dataset, we noticed that the desired output maps do not contain particularly fine structures, but are instead fairly regular and in most cases contain rectangular buildings. Aerial images, by contrast, contain fine details (e.g., created by shadows), which a CRF-based post-processing tries to integrate. This seemed to damage the overall performance in our case. Instead, for post-processing, it proved to be more important to establish the general structure of buildings, which meant filling in small holes inside of larger areas that have been recognized as a building and which might appear as a result, for example, from occlusion.

8.3. Network Architecture

Our network consists of a series of encoding blocks, followed by as many decoding blocks (see Figure 12 ). We tried networks with 6, 8, 10, and 12, which we refer to as Net6, Net8, Net10, and Net12, respectively. The deeper the network, i.e. the more blocks it contains, the better it performs. For this particular work, our best evaluated results were from Net12.

www.frontiersin.org

FIGURE 12 . Figure 15 : Schematic representation of our U-Net-based network architecture (Net10): A sequence of encoding blocks (in blue) on the left and a corresponding sequence of decoding blocks (in green) on the right, with skip connections (dashed arrows) between them. Inside each block, we indicate the size of the internal representation at that stage. We show the setup of all encoding and decoding blocks at the bottom. (Comparing an adapted U-Net architecture for Varying Depths).

The composition of individual blocks always follows the same structure:

• Encoding blocks consist of a 5 × 5 convolution layer (padded with a stride of 1 × 1 ) with an increasing number of 64, 128, 192, 256, and 320 filters, followed by a Rectified Linear Unit. The resulting features are then normalized using batch normalization ( Ioffe and Szegedy, 2015 ), and downsampled by a 2 × 2 max pooling operation.

• Decoding blocks are symmetric to the encoding blocks, also using a 5 × 5 convolution (padded with a stride of 1 × 1 ), followed by a Rectified Linear Unit, batch normalization and a 2 × 2 upsampling operation, where the low dimensional features of the previous layer are resized. Additionally, a dropout layer ( Srivastava et al., 2014 ), omitting 30 % of the neurons, is added in the decoding blocks.

Additionally, we integrate skip connections ( Ronneberger et al., 2015 ). For this, the outputs of corresponding (see Figure 12 ) encoding and decoding blocks are concatenated and fed as inputs to the next block. The introduction of skip connections improved results greatly, even during our initial experiments with the downsampled data. In particular, skip connections helped bringing out more detailed structures of buildings.

Our approach is similar to Segnet ( Badrinarayanan et al., 2015 ) which is also using block-wise encoders and decoders, as well as upsampling on the decoder side. In contrast to our approach, they are always applying multiple convolutions, but of a smaller size. In addition, during decoding, the convolutions are applied before the upsampling step. Furthermore, in SegNet, the indices of the max-pooling layer are used during upsampling, while in our approach, information flows using the skip connections, much like U-Net.

As the overall goal was the detection of buildings within the input images, we set up the network as a binary classifier. Therefore, after the last convolution, we use a sigmoid activation function, to obtain pixel-wise pseudo-probabilities, as discussed above.

In order to match input and output size after downsampling and upsampling, we use zero padding and cropping padding when necessary.

8.4. Training

For the implementation and training of our approach, we used keras ( Chollet, 2015 ) while using tensorflow ( Abadi et al., 2016 ) as the backend framework for training our models on multiple GPUs. The same procedure was followed to train different network configurations Net6, Net8, Net10, and Net12 that differ in depth. The task for all the variations of the architecture was to map the 3-channel training data to the corresponding 1-channel ground truth annotation. We adopted the ADAM ( Kingma and Ba, 2014 ) optimizer with the default settings: beta_1 of value 0.9 and beta_2 of value 0.999 using mini-batch gradient descent. For the network configuration Net6, Net8, and Net10, the initial learning rate was set to α = 10 − 3 . At every training-step t , the learning rate was decayed according to α ← α ⋅ ( 1 + δ α ⋅ t ) − 1 with a decay rate of δ α = 5 × 10 − 5 . For Net12 the learning rate was also set to an initial value of α = 10 − 3 , but no decay was applied. The batch size was varied as 32 or 64 based on the network configuration with the maximum utilization of 2 × NVIDIA Tesla p100 or 2 × GeForce GTX 1080 Ti GPUs respectively. The replicated network’s weights were merged on the local CPU on end of each epoch.

For the first 50 epochs all networks were trained using the binary cross-entropy . For further refinement, the networks were trained for 10 additional epochs with the dice coefficient loss ( Milletari et al., 2016 ).

Using the setup that was described in this section, the average training time for each network Net6, Net8, Net10, and Net12 took about 2.2 days.

As we considered this challenge as a binary classification problem (buildings and non-building classes), the binary cross-entropy will be defined as

where y t is the target and y p the predicted class.

8.5. Experiments and Results

The four in this section presented network architectures of varying depths were trained on the provided dataset according to the training procedure described in Section 8.4. To find the optimal parameters, the threshold ( θ ) value was varied from 0.2 to 0.95 with respective squared shape morphology kernel size ( k ) between 0 and 5. Even though the pixel-wise accuracy was high because of the dominating background class, the mis-classified pixels lead to decrease in precision and recall.

An overview of the final results are reported in Table 3 with morphology for dilation and erosion set to 1. In the table, evaluation results are shown as scored on the official test data set for all networks of varying depths.

www.frontiersin.org

TABLE 3 . Precision and recall per epoch at detection (NMS THRESHOLD = 0.5 for the full validation set).

The network configurations Net6, Net8, Net10, and Net12 are evaluated with θ = 0.5. The listed results in Table 3 show that the precision of the network was improved when the number of used encoding and decoding blocks is increased. This might indicate that a further improvement of the precision in the detection of buildings can be achieved by again enlarging the network architecture with more blocks.

The additional training with the dice coefficient did only slightly enhance the performance of the networks Net6, Net8 and Net12, but not in the case of Net10.

As reported in Table 3 , our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.912, and a A R I o U ≥ 0.5 of 0.942.

9. Instance Segmentation using Mask R-Convolutional Neuronal Networks

In this section, we explore the use of Mask R-CNN, a two-stage object detection architecture to detect a single class (buildings). Here, we consider Mask R-CNN in contrast to U-Net based approaches, as an I o U threshold of 0.5, helps to not warrant the high semantic accuracy that U-Net based pixel classification approaches provide. On the other hand single stage object detection architectures such as SSD ( Liu et al., 2016 ), simply learn bounding box regression and its class probabilities. They have faster inference times, but Mask R-CNN has consistently shown better accuracies, and includes semantic output.

Mask R-CNN builds up on the Regional Proposal Networks as proposed in Faster RCNN ( Ren et al., 2015 ). This first stage proposal network (illustrated in Figure 13 ) selects the regions of interest from a pre-determined set of anchors, and feature-maps from a bottom-up Resnet-101 backbone ( He et al., 2016 ), and a top-down feature pyramid network ( Lin et al., 2017 ). It matches Anchors of different sizes and aspect ratios, computed from a set of scales, to objects in an image. For implementing the modifications upon Mask R-CNN, we start off with open source implementation of Mask R-CNN (Matterport, Abdulla, 2017 ).

www.frontiersin.org

FIGURE 13 . Mask R-CNN stages with modifications introduced for the Mapping challenge (see Section 9 for detailed explanation of architecture and modifications).

For the task of instance segmentation on satellite imagery, we considered anchor scales of 8, 16, 32, 64, 128. We considered smaller anchor sizes as the dataset had a significantly higher distribution of “small” instances, and at the same time the input images had a maximum size of 300 × 300 pixels. The distribution of instances in the training and the validation set includes 60% of the instances were medium sized instances (area between 1,024 and 9,216 pixel 2 ), and 37% instances were small instances (area less than 1,024 pixel 2 ). 19% of the total annotations had an area less than 256 pixel 2 (in many cases = 16%, because of tile borders intersecting the edges of the buildings). The changed anchor size ensures that the regional proposal network appropriates anchors suitable, specifically for small objects. A montage containing the generated anchors, refined anchors with small deltas, ground truth, and prediction for an image is shown in Figure 14 .

www.frontiersin.org

FIGURE 14 . Intermediate anchor generation from region proposals and final mask predictions from the detection stage for the Mask R-CNN approach.

The second stage of detection in Mask R-CNN, RoIAlign, a novel feature introduced in Mask R-CNN aligns the feature-maps from first stage to the input image, resulting in better localization of output masks.

Mask R-CNN implementation creates a placeholder for the ground-truth masks that can consume a large amount of memory. Up-sampling the image, will create a placeholder for masks with the up-sampled dimension. Instead changing the stride as mentioned, means lesser memory requirements. At the same time, with the reduction in down-sampling, the overall compute overhead increases. On an Amazon AWS P3 instance (p3.2xlarge), which uses a single Nvidia V100 T GPU with 16 Gb GPU memory, an epoch with a batch size of 4, takes roughly 16 h.

In the Mask R-CNN implementation, negative anchors out of the ROIs, are assigned when IoU is less than 0.3 and positive when IoU is greater than or equal to 0.7, compared to the ground-truth boxes. To generate targets for Stage 2 classifier and mask heads, without using the RPN head, the default implementation uses 0.5 threshold to distinguish between negative ROIs and positive ROIs. Assigning negative values to ROIs with IoU of 0.1–0.5, ensures that there is some minimum intersection with the ground truth. ROIs that have less than 0.1 IoU are discarded. Doing so introduces hard example mining, as even to form negative samples, there is some minimal intersection criteria. We padded the images by 10 pixel on each side to create the final input image of 320 × 320 pixels. Padding ensures that at the border there are more valid anchors available to select the best matching ROIs, including the elongated instances that have a small width or height.

One potential avenue to improve small object detection is to use dilation ( Yu and Koltun, 2015 ), which enlarges the receptive field without losing resolution, and so can provide a context for detecting small buildings. We set the dilation rate to 2 on all combination of stages in the Resnet backbone. This is a dataset for a single object detection, so discriminative information from context could not perhaps be as useful but providing the context, in general, is a valuable means in detecting small objects.

To train on the Mapping Challenge Dataset, we used the pre-trained model (trained on MS COCO dataset) to initialize the Mask R-CNN network. Then we used the Mapping Challenge Dataset to train the model for eight epochs, as shown in Table 4 . Epoch 1 to 4 minimized the loss using Stochastic Gradient Descent, at a learning rate of 0.001 and Epoch 5 to 7 used learning rate of 0.0001. On epoch 8, we used half the samples to train at a learning rate of 0.0001 but the other half used learning rate of 1 e − 5 . We also applied an augmentation of horizontal flip and a vertical flip during training.

www.frontiersin.org

TABLE 4 . Overview and comparison of results for each model.

Epoch 2 to 4 were trained on a smaller subset of the training set, obtained by filtering the dataset so that at least one object of area less than 256 pixel 2 appeared in any given image. Medium and large objects already had high precision and recall at epoch 1, as shown in Figure 15 .

www.frontiersin.org

FIGURE 15 . Comparison of precision and recall for different object sizes when using the Mask R-CNN approach detailed in Section 9.

Nevertheless, We can see in Figure 15 that from epoch 1 to epoch 8 small object precision improved from 0.706 to 0.82 while medium object improved marginally from 0.9857 to 0.9874. An explanation can be attributed to the fact that the smallest anchor scale used for the MS COCO dataset was 32, while in ours, the smaller anchor scale of 8. While the hierarchical representation learned by the ResNet backbone on the COCO dataset, especially in the bottom layers were useful, the Mask R-CNN model had not seen many small object samples, so the whole pipeline required more training to be as effective for the smaller objects. Weights trained from samples with mask loss weight set to 10 was also kept as part of the final model. A straightforward approach that might result in a better model would be to train for four epochs with a learning rate of 0.001, another 4 with a learning rate of 0.0001 and 1 epoch with learning rate of 1 e − 5 , with no sub-setting and with no changes to mask loss weights. At validation and test time, predictions from the annotated images, flipped vertically and horizontally is merged with the regular prediction. Then non-max-suppression is applied to obtain the final prediction annotation.

The multi-task loss in Mask R-CNN includes loss from the region proposal stage and the detection stage. The regional proposal loss consists of, class loss , for positive and negative ROIs and associated bounding box loss . The detection stage consists of, object class loss , bounding box loss and mask loss . Weighting a particular loss can affect its contribution to the overall loss. In this implementation, we increased the weight of the mask loss by 10 folds, near the end of training, for better semantic segmentation.

Our final performance on the held-out test set was an A P I o U ≥ 0.5 of 0.937, and a A R I o U ≥ 0.5 of 0.959.

10. Conclusion

In this work, we explore different flavors of U-Net and Mask R-CNN on a task of instance segmentation on high resolution satellite imagery to detect buildings. The dataset used, was a derivative of the SpaceNet ( Spacenet on aws, 2018 ) dataset, and was post processed to enhance ease of accessibility for a broader set of Deep Learning researchers who may or may not be familiar with the handling and manipulation of raw satellite imagery. The evaluation metric used for all the experiments was designed to incentivize loose segmentation (an I o U ≥ 0.5 was considered a correct detection) of buildings of various shapes and sizes.

In the previous sections, four U-Net implementations were presented, each coming with its own specificities, and one Mask R-CNN approach, which was finally found to be the best performing model. A comparison of some example results for all these different architectures is shown in Figure 16 and detailed results on a test data set are given in Table 5 .

www.frontiersin.org

FIGURE 16 . Inference results of each model on a few sample images: Model 1 - Instance segmentation using Deeper U-Net (Section 7); Model 2 - Instance Segmentation using Mask R-CNN (Section 9); Model 3 - Adapted U-Net architecture of Varying Depths (Section 8); Model 4 - Fast Learnings for Fast Mapping Using U-Net (Section 6); Model 5 - Instance Segmentation Using Customized U-Net (Section 5).

www.frontiersin.org

TABLE 5 . Explanation of computational steps of the U-Net pipeline (see Figure 2 ).

The task presented two major difficulties arising from the evaluation metric choice: the designed model had to detect small instances as effectively as larger ones, and had to distinguish thin separation areas between closely located buildings. The first U-Net approach, presented in Section 5, relied on a custom weighted loss function to alleviate these difficulties, penalizing more misclassification of pixels located on small instances or in the separation areas of two close buildings. At testing time, several images were generated by rotating and flipping the original ones, and a gradient-boosting algorithm—Light-GBM—was used to construct the final prediction mask, based on the aggregated U-Net outputs. Although this approach allowed to reach high performance scores ( A P I o U ≥ 0.5 of 0.938, A R I o U ≥ 0.5 of 0.946), another U-Net approach, presented in Section 7, achieves similar scores without performing any data augmentation, introducing custom weight coefficients in the loss, nor using boosting algorithm on top of the results. The new U-Net proposed was however modified to become one layer deeper compared to the original U-Net implementation, which increased the training time needed. One approach that originated from an auto-encoder-like structure systematically analyzed the influence of the depth of a U-Net like architecture in Section 8. It further investigated additional improvements on top of the auto-encoder network as the use of some image processing tools, like the morphological opening filter and CRF. Deeper U-Nets showed to further improve the performance, but not only is training time increasing, but the impact grew smaller. This might be further enhanced through introducing data augmentation as done in the first approach and which might become more important with a growing number of layers and parameters. In Section 6, another U-Net is presented, applying other adaptations during training: it was trained following customized learning rate and momentum schedules, allowing to reduce the time needed to train the model until convergence. Both these two U-Net approaches allowed to reach high scores similar to the ones that were attained with the two approaches presented first. Finally, the best performing model is the one presented in Section 9, which proposes an implementation of Mask R-CNN to solve the segmentation task. To improve detection of small instances, the anchor sizes were lowered compared to the original implementation. This model, once trained, reached a A P I o U ≥ 0.5 of 0.937, and a A R I o U ≥ 0.5 of 0.959.

All the approaches presented in this paper were found to be efficient ways of solving the building segmentation task proposed on satellite images. Interestingly, each approach came with its own adaptations, and the scores attained by the different proposed models reached similar A P I o U ≥ 0.5 and A R I o U ≥ 0.5 scores.

Data Availability Statement

All the datasets used in the experiments in the paper are available at: https://www.aicrowd.com/challenges/mapping-challenge . The test dataset is not released, as the goal of this work is to set up an ongoing benchmark. However, participants can evaluate their models by submitting solutions to the page here: https://www.aicrowd.com/challenges/mapping-challenge .

Author Contributions

The contributors to the paper are SM, JC, KK, AP, PT, SK, JR, DL, MP, SF, JG, MSc, AT, GM, NR, MSa. Out of the above contributors SM came up with the idea, baseline, and the required dataset aggregation and hosting of the challenge. JC, KK, AP, PT, SK, JR, DL, MP, SF, JG, MSc, AT—are participants in the challenge, GM and NR aggregated the results and analysis for the research paper and MSa helped in conceptualising the problem statement. All authors contributed to the article and approved the submitted version.

The research/work of Section 8 was supported by the Cluster of Excellence Cognitive Interaction Technology “CITEC” (EXC 277) at Bielefeld University, which is funded by the German Research Foundation (DFG).

Conflict of Interest

Authors JC, KK were employed by the company neptune.ml; AP, PT were employed by deepsense.ai; SK was employed by NARMA; DL was employed by Anthropocene Labs.The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Acknowledgments

The authors would like to thank Lars Bromley (UNOSAT), Miguel Luengo-Oroz (UN Global Pulse) and Paul Vermutllen (Humanity and Inclusion) for their continued support and help in making this work possible. Additionally, the authors would like to thank the authors of the SpaceNet ( Spacenet on aws, 2018 ) dataset for allowing us to use it as the source dataset in the competition.

a The individual training, validation and test sets are available at: https://www.aicrowd.com/challenges/mapping-challenge/dataset_files

b open source framework, available here: https://pytorch.org/

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). “Tensorflow: a system for large-scale machine learning,” in OSDI (London, UK: Pearson Education ), Vol. 16, 265–283.

Google Scholar

Abdulla, W. (2017). Mask r-cnn for object detection and instance segmentation on keras and tensorflow. Available at: https://github.com/matterport/Mask_RCNN (Accessed October 9, 2020).

Atif, N., Bhuyan, M., and Ahamed, S. (2019). “A review on semantic segmentation from a modern perspective,” in 2019 international conference on electrical, electronics and computer engineering (UPCON) , 2019 November 8 – 10 , Aligarh, India . Piscataway, NJ: IEEE , 1–6.

Badrinarayanan, V., Kendall, A., and Cipolla, R. (2015). “Segnet: a deep convolutional encoder-decoder architecture for image segmentation,” in IEEE Transactions on Pattern Analysis and Machine Intelligence , 1 December 2017 , Vol. 39, No. 12, 2481–2495, CoRR, abs/1511.00561.

Bradski, G. (2000). The OpenCV library. Dr Dobb’s J. Software Tools. 25, 120–125.

Ben Hamida, A., Benoit, A., Lambert, P., Klein, L., Ben Amar, C., and Audebert, N. (2017). “Deep learning for semantic segmentation of remote sensing images with rich spectral content,” in 2017 IEEE international geoscience and remote sensing symposium (IGARSS) , Piscataway, NJ: IEEE . 2569–2572.

Castillo-Navarro, J., Audebert, N., Boulch, A., Le Saux, B., and Lefèvre, S. (2019). “What data are needed for semantic segmentation in earth observation?,” in 2019 joint urban remote sensing event (JURSE) , May 2019 , Vannes, France , 1–4.

Chen, G., Zhang, X., Wang, Q., Dai, F., Gong, Y., and Zhu, K. (2018). Symmetrical dense-shortcut deep fully convolutional networks for semantic segmentation of very-high-resolution remote sensing images. IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing. , 11 (5), 1633–1644. doi:10.1109/jstars.2018.2810320

CrossRef Full Text | Google Scholar

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., and Yuille, A. L. (2018a). DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. , 40 (4), 834–848.

Chen, L.-C., Zhu, Y., Papandreou, G., Schroff, F., and Adam, H. (2018b). “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV) , 2018 Sep 8–14 , Munich, Germany , 801–818.

Chollet, F. (2015). Keras. Available at: https://github.com/fchollet/keras and https://blog.keras.io/building-autoencoders-in-keras.html (Accessed October 9, 2020).

Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S., and Pal, C. (2016). “The importance of skip connections in biomedical image segmentation,” in Deep learning and data labeling for medical applications. dlmia 2016, labels 2016. Lecture notes in computer science . Editor G. Carneiro (Cham, Switzerland: Springer ), Vol. 10008. CoRR, abs/1608.04117.

Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics. , 36 (4), 193–202. doi:10.1007/bf00344251

Giannakopoulos, P. (2017). Kaggle-carvana-image-masking-challenge. Available at: https://github.com/petrosgk/Kaggle-Carvana-Image-Masking-Challenge (Accessed October 9, 2020).

Gugger, S. (2017a). How do you find a good learning rate. Available at: https://sgugger.github.io/how-do-you-find-a-good-learning-rate.html . (Accessed October 9, 2020)

Gugger, S. (2017b). The 1cycle policy. Available at: https://sgugger.github.io/the-1cycle-policy.html .

Hao, S., Zhou, Y., and Guo, Y. (2020). A brief survey on semantic segmentation with deep learning. Neurocomputing , 406, 302–321. doi:10.1016/j.neucom.2019.11.118

He, K., Zhang, X., Ren, S., and Sun, J. (2016). “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , San Juan, Puerto Rico: IEEE , 770–778.

He, K., Gkioxari, G., Dollár, P., and Girshick, R. B. (2017). “Mask R-CNN,” in 2017 IEEE International conference on computer vision (ICCV) , Venice: IEEE , 2980–2988, CoRR, abs/1703.06870.

He, X., Zemel, R. S., and Carreira-Perpiñán, M. Á. (2004). “Multiscale conditional random fields for image labeling,” in Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004 . New York, NY: IEEE , CVPR 2004, Vol. 2, II.

Howard, J. (2018). Training Imagenet in 3 hours for $ 25 ; and   CIFAR10   for $0.26. Available at: http://www.fast.ai/2018/04/30/dawnbench-fastai/ .

Howard, J, and Ruder, S. (2018). “Universal language model fine-tuning for text classification,” in Proceedings of the 56th annual meeting of the association for computational linguistics (Long Papers), Melbourne, Australia, July 15 - 20, 2018.

Iglovikov, V. I., Seferbekov, S., Buslaev, A. V., and Shvets, A. (2018). TernausNetV2: fully convolutional network for instance segmentation. ArXiv e-prints.

Ioffe, S., and Szegedy, C. (2015). “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd international conference on international conference on machine learning, PMLR 37 , Lille, France , Vol. 37, 448–456.

Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., and Bengio, Y. (2017). “The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation,” in Proceedings of the conference on computer vision and pattern recognition workshops (CVPRW), Honolulu, HI , 11–19.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). “Lightgbm: a highly efficient gradient boosting decision tree,” in 31st conference on neural information processing systems (NIPS 2017) , Long Beach, CA . December 2017, 3149–3157.

Kingma, D. P., and Ba, J. (2014). Adam: a method for stochastic optimization. CoRR, abs/1412.6980.

Krähenbühl, P., and Koltun, V. (2012). Efficient inference in fully connected crfs with gaussian edge potentials . Stanford, CA: Stanford University CoRR, abs/1210.5644.

Krizhevsky, A., Sutskever, I., and Geoffrey, E. (2012). “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems , Cambridge, MA: The MIT Press , 1097–1105.

Lateef, F., and Ruichek, Y. (2019). Survey on semantic segmentation using deep learning techniques. Neurocomputing , 338, 321–348. doi:10.1016/j.neucom.2019.02.003

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature 521 (7553), 436, doi:10.1038/nature14539

Leslie, S, and Topin, N. (2017). “Super-convergence: very fast training of neural networks using large learning rates,” in Artificial intelligence and machine learning for multi-domain operations applications , Ithaca, NY: Cornell University .

Li, R., Liu, W., Yang, L., Sun, S., Hu, W., Zhang, F., et al. (2017). “DeepUNet: a deep fully convolutional network for pixel-level sea-land segmentation,” in IEEE journal of selected topics in applied earth observations and remote sensing , November 2018 , IEEE , Vol. 11, No. 11, 3954–3962.

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., and Belongie, S. (2017). “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition . (CVPR), Honolulu, HI , 936–944.

Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., et al. (2016). “SSD: single shot multibox detector,” in: Computer Vision – ECCV 2016. ECCV 2016. Lecture Notes in Computer Science. Editors B. Leibe, J. Matas, N. Sebe, M. Welling (Springer, Cham), Vol. 9905.

Long, J., Shelhamer, E., and Darrell, T. (2015). “Fully convolutional networks for semantic segmentation,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015 , 3431–3440.

Milletari, F., Navab, N., and Ahmadi, S.-A. (2016). “V-net: fully convolutional neural networks for volumetric medical image segmentation,” in 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, 2016, 565--571 .

Minaee, S., Boykov, Y., Porikli, F., Plaza, A., Kehtarnavaz, N., and Terzopoulos, D. (2020). Image segmentation using deep learning: a survey. arXiv preprint arXiv:2001.05566.

Ronneberger, O., Fischer, P., and Brox, T. (2015). “U-net: convolutional networks for biomedical image segmentation,” in Medical image computing and computer-assisted intervention (MICCAI), volume 9351 of LNCS . New York, NY: Springer ,, 234–241. Available on arXiv:1505.04597 [cs.CV].

Ren, S., He, K., Girshick, R., and Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst . 28, 91–99.

Shotton, J., Winn, J., Rother, C., and Criminisi, A. (2009). TextonBoost for image understanding: multi-class object recognition and segmentation by jointly modeling texture, layout, and context. Int. J. Comput. Vis. 81 (1), 2–23. doi:10.1007/s11263-007-0109-1

Smith, L. N. (2017). “Cyclical learning rates for training neural networks,” in 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) , Santa Rosa, CA , 464–472.

Spacenet on aws (2018). Available at: https://spacenetchallenge.github.io/datasets/datasetHomePage.html (Accessed July 16, 2018).

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. , 15(1), 1929–1958.

Stanford DAWNBench (2018). Stanford dawnbench competition. Available at: https://dawn.cs.stanford.edu/benchmark .

Thoma, M., (2016). A survey of semantic segmentation. CoRR, abs/1602.06541.

Wu, H., Zhang, J., Huang, K., Liang, K., and Yu, Y. (2019). FastFCN: rethinking dilated convolution in the backbone for semantic segmentation. arXiv preprint arXiv:1903.11816. doi:10.1109/iccv.2019.00687

Xie, S., Girshick, R. B., Dollár, P., Tu, Z., and He, K. (2016). “Aggregated residual transformations for deep neural networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Honolulu, HI . CoRR, abs/1611.05431.

Yu, F., and Koltun, V. (2015). Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122.

Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., et al. (2015). “Conditional random fields as recurrent neural networks,” in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, 2015, 1529–1537 CoRR, abs/1502.03240.

Keywords: deep learning, machine learning, remote sensing, satellite imagery, semantic segmentation

Citation: Mohanty SP, Czakon J, Kaczmarek KA, Pyskir A, Tarasiewicz P, Kunwar S, Rohrbach J, Luo D, Prasad M, Fleer S, Göpfert JP, Tandon A, Mollard G, Rayaprolu N, Salathe M and Schilling M (2020) Deep Learning for Understanding Satellite Imagery: An Experimental Survey. Front. Artif. Intell. 3:534696. doi:10.3389/frai.2020.534696

Received: 13 February 2020; Accepted: 22 September 2020; Published: 16 November 2020.

Reviewed by:

Copyright © 2020 Mohanty, Czakon, Kaczmarek, Pyskir, Tarasiewicz, Kunwar, Rohrbach, Luo, Prasad, Fleer, Göpfert, Tandon, Mollard, Rayaprolu, Salathé, and Schilling. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Nikhil Rayaprolu, [email protected] , Sharada Prasanna Mohanty, [email protected] , Marcel Salathe, [email protected]

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

applsci-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Image processing techniques for analysis of satellite images for historical maps classification—an overview.

satellite image processing thesis titles

1. Introduction

2. image processing in remote sensing, 2.1. image enhancement, 2.2. feature extraction, 2.3. image segmentation, 2.4. image fusion, 2.5. change detection, 2.6. image compression, 2.7. image classification, 2.8. image feature detection, 3. performance analysis, 4. discussion, 5. conclusions, author contributions, conflicts of interest.

  • Ping, X.; Bingqiang, C.; Lingyun, X.; Jingcheng, Z.; Lei, Z.; Hangbo, D. A new MNF–BM4D denoising algorithm based on guided filtering for hyperspectral images. ISA Trans. 2019 , 92 , 315–324. [ Google Scholar ]
  • Chang, Y.C. A flexible contrast enhancement method with visual effects and brightness preservation: Histogram planting. Comput. Electr. Eng. 2018 , 69 , 796–807. [ Google Scholar ] [ CrossRef ]
  • Suresh, S.; Lal, S. Modified differential evolution algorithm for contrast and brightness enhancement of satellite images. Appl. Soft Comput. J. 2017 , 61 , 622–641. [ Google Scholar ] [ CrossRef ]
  • Singh, H.; Kumar, A.; Balyan, L.K.; Singh, G.K. A novel optimally weighted framework of piecewise gamma corrected fractional order masking for satellite image enhancement. Comput. Electr. Eng. 2019 , 75 , 245–261. [ Google Scholar ] [ CrossRef ]
  • Tang, S.; Wu, B.; Zhu, Q. Combined adjustment of multi-resolution satellite imagery for improved geo-positioning accuracy. ISPRS J. Photogramm. Remote Sens. 2016 , 114 , 125–136. [ Google Scholar ] [ CrossRef ]
  • Vijayaraj, V.; Bright, E.A.; Bhaduri, B.L. Rapid damage assessment from high resolution imagery. Int. Geosci. Remote Sens. Symp. 2008 , 3 , 1445–1448. [ Google Scholar ]
  • Yuan, X.; Chen, S.; Yuan, W.; Cai, Y. Poor textural image tie point matching via graph theory. ISPRS J. Photogramm. Remote Sens. 2017 , 129 , 21–31. [ Google Scholar ] [ CrossRef ]
  • Sedaghat, A.; Mohammadi, N. Uniform competency-based local feature extraction for remote sensing images. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 142–157. [ Google Scholar ] [ CrossRef ]
  • Rathore, M.M.U.; Ahmad, A.; Paul, A.; Wu, J. Real-time continuous feature extraction in large size satellite images. J. Syst. Archit. 2016 , 64 , 122–132. [ Google Scholar ] [ CrossRef ]
  • Zhang, L.; Sun, Q. Saliency detection and region of interest extraction based on multi-image common saliency analysis in satellite images. Neurocomputing 2018 , 283 , 150–165. [ Google Scholar ] [ CrossRef ]
  • Huang, X.; Chen, H.; Gong, J. Angular difference feature extraction for urban scene classification using ZY-3 multi-angle high-resolution satellite imagery. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 127–141. [ Google Scholar ] [ CrossRef ]
  • Schmidt, A.; Lafarge, F.; Brenner, C.; Rottensteiner, F.; Heipke, C. Forest point processes for the automatic extraction of networks in raster data. ISPRS J. Photogramm. Remote Sens. 2017 , 126 , 38–55. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ciecholewski, M. River channel segmentation in polarimetric SAR images: Watershed transform combined with average contrast maximisation. Expert Syst. Appl. 2017 , 82 , 196–215. [ Google Scholar ] [ CrossRef ]
  • Suresh, S.; Lal, S. An efficient cuckoo search algorithm based multilevel thresholding for segmentation of satellite images using different objective functions. Expert Syst. Appl. 2016 , 58 , 184–209. [ Google Scholar ] [ CrossRef ]
  • Grinias, I.; Panagiotakis, C.; Tziritas, G. MRF-based segmentation and unsupervised classification for building and road detection in peri-urban areas of high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2016 , 122 , 145–166. [ Google Scholar ] [ CrossRef ]
  • Marmanis, D.; Schindler, K.; Wegner, J.D.; Galliani, S.; Datcu, M.; Stilla, U. Classification with an edge: Improving semantic image segmentation with boundary detection. ISPRS J. Photogramm. Remote Sens. 2018 , 135 , 158–172. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Pare, S.; Bhandari, A.K.; Kumar, A.; Singh, G.K. A new technique for multilevel color image thresholding based on modified fuzzy entropy and Lévy flight firefly algorithm. Comput. Electr. Eng. 2018 , 70 , 476–495. [ Google Scholar ] [ CrossRef ]
  • Alshehhi, R.; Marpu, P.R. Hierarchical graph-based segmentation for extracting road networks from high-resolution satellite images. ISPRS J. Photogramm. Remote Sens. 2017 , 126 , 245–260. [ Google Scholar ] [ CrossRef ]
  • Anjos, A.; dos Shahbazkia, H.R. Bi-Level Image Thresholding. In Proceedings of the First International Conference on Bio-inspired Systems and Signal Processing, Funchal, Portugal, 28–31 January 2008; pp. 70–76. [ Google Scholar ]
  • Zheng, X.; Ye, H.; Tang, Y. Image bi-level thresholding based on gray level-local variance histogram. Entropy 2017 , 19 , 191. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017 , 39 , 2481–2495. [ Google Scholar ] [ CrossRef ]
  • Bhandari, A.K.; Kumar, A.; Singh, G.K. Tsallis entropy based multilevel thresholding for colored satellite image segmentation using evolutionary algorithms. Expert Syst. Appl. 2015 , 42 , 8707–8730. [ Google Scholar ] [ CrossRef ]
  • Su, T.; Zhang, S. Local and global evaluation for remote sensing image segmentation. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 256–276. [ Google Scholar ] [ CrossRef ]
  • Tian, J.; Dezert, J. Fusion of multispectral imagery and DSMs for building change detection using belief functions and reliabilities. Int. J. Image Data Fusion 2018 , 9832 , 1–27. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ferraris, V.; Dobigeon, N.; Wei, Q.; Chabert, M. Detecting Changes Between Optical Images of Different Spatial and Spectral Resolutions: A Fusion-Based Approach. IEEE Trans. Geosci. Remote Sens. 2016 , 56 , 1566–1578. [ Google Scholar ] [ CrossRef ]
  • Gangkofner, U.G.; Pradhan, P.S.; Holcomb, D.W. Optimizing the high-pass filter addition technique for image fusion. Photogramm. Eng. Remote Sens. 2008 , 74 , 1107–1118. [ Google Scholar ] [ CrossRef ]
  • Al-wassai, F.A.; Kalyankar, N.V.; Al-zuky, A.A. Arithmetic and Frequency Filtering Methods of Pixel-Based Image Fusion Techniques. Int. J. Comput. Sci. Issues 2011 , 8 , 113–122. [ Google Scholar ]
  • de Béthune, S.; Muller, F.; Binard, M. Adaptive intensity matching filters: A new tool for multiresolution data fusion. In Proceedings of the Multi-Sensor Systems and Data Fusion for Telecommunications, Remote Sensing and Radar, Lisbon, Portugal, 29 September–22 October 1997; pp. 28.1–28.15. [ Google Scholar ]
  • Cornet, Y.; de Bethune, S.; Binard, M.; Muller, F.; Legros, G.; Nadasdi, I. RS data fusion by local mean and variance matching algorithms: Their respective efficiency in a complex urban context. In Proceedings of the IEEEASPRS Joint Workshop on Remote Sensing and Data Fusion over Urban Areas, Rome, Italy, 8–9 November 2001; pp. 105–109. [ Google Scholar ]
  • Gaikwad, A.J.; Vishwakarma, S.; Bhalerao, S. A Review on Various Image Fusion Techniques. Int. J. Adv. Res. Comput. Commun. Eng. ISO 2016 , 5 , 144–149. [ Google Scholar ]
  • Chang, N.; Member, S.; Bai, K.; Imen, S.; Chen, C.; Gao, W. Multisensor Satellite Image Fusion and Networking for All-Weather Environmental Monitoring. IEEE Syst. J. 2018 , 12 , 1341–1357. [ Google Scholar ] [ CrossRef ]
  • Mascarenhas, N.D.A.; Banon, G.J.F.; Candeias, A.L.B. Multispectral image data fusion under a Bayesian approach. Int. J. Remote Sens. 1996 , 17 , 1457–1471. [ Google Scholar ] [ CrossRef ]
  • Fasbender, D.; Brasseur, O.; Bogaert, P. Bayesian data fusion for space–Time prediction of air pollutants: The case of NO 2 in Belgium. Atmos. Environ. 2009 , 43 , 4632–4645. [ Google Scholar ] [ CrossRef ]
  • Chang, N.; Vannah, B.W.; Yang, Y.J. Integrated data fusion and mining techniques for monitoring total organic carbon concentrations in a lake. Int. J. Remote Sens. 2014 , 35 , 1064–1093. [ Google Scholar ] [ CrossRef ]
  • Luo, X.; Zhang, Z.; Wu, X. A novel algorithm of remote sensing image fusion based on shift-invariant Shearlet transform and regional selection. Int. J. Electron. Commun. 2016 , 70 , 186–197. [ Google Scholar ] [ CrossRef ]
  • Maurya, L.; Mahapatra, P.K.; Kumar, A. A social spider optimized image fusion approach for contrast enhancement and brightness preservation. Appl. Soft Comput. 2017 , 52 , 575–592. [ Google Scholar ] [ CrossRef ]
  • Zhu, Z.; Yin, H.; Chai, Y.; Li, Y.; Qi, G. A novel multi-modality image fusion method based on image decomposition and sparse representation. Inf. Sci. 2018 , 432 , 516–529. [ Google Scholar ] [ CrossRef ]
  • Anandhi, D.; Valli, S. An algorithm for multi-sensor image fusion using maximum a posteriori and nonsubsampled contourlet transform. Comput. Electr. Eng. 2018 , 65 , 139–152. [ Google Scholar ] [ CrossRef ]
  • Li, Y.; He, Z.; Zhu, H.; Zhang, W.; Wu, Y. Jointly registering and fusing images from multiple sensors. Inf. Fusion 2016 , 27 , 85–94. [ Google Scholar ] [ CrossRef ]
  • Kim, M.; Han, D.K.; Ko, H. Joint patch clustering-based dictionary learning for multimodal image fusion. Inf. Fusion 2016 , 27 , 198–214. [ Google Scholar ] [ CrossRef ]
  • Wu, M.; Huang, W.; Niu, Z.; Wang, C.; Li, W.; Yu, B. Validation of synthetic daily Landsat NDVI time series data generated by the improved spatial and temporal data fusion approach. Inf. Fusion 2018 , 40 , 34–44. [ Google Scholar ] [ CrossRef ]
  • Zou, J.; Li, W.; Chen, C.; Du, Q. Scene classification using local and global features with collaborative representation fusion. Inf. Sci. 2016 , 348 , 209–226. [ Google Scholar ] [ CrossRef ]
  • Li, X.; Wang, L.; Wang, J.; Zhang, X. Multi-focus image fusion algorithm based on multilevel morphological component analysis and support vector machine. IET Image Process 2017 , 11 , 919–926. [ Google Scholar ] [ CrossRef ]
  • Roy, D.P.; Ju, J.; Lewis, P.; Schaaf, C.; Gao, F.; Hansen, M.; Lindquist, E. Multi-temporal MODIS-Landsat data fusion for relative radiometric normalization, gap filling, and prediction of Landsat data. Remote Sens. Environ. 2008 , 112 , 3112–3130. [ Google Scholar ] [ CrossRef ]
  • Gao, F.; Hilker, T.; Zhu, X.; Anderson, M.; Masek, J.; Wang, P.; Yang, Y. Fusing Landsat and MODIS Data for Vegetation Monitoring. IEEE Geosci. Remote Sens. Mag. 2015 , 3 , 47–60. [ Google Scholar ] [ CrossRef ]
  • Gong, M.; Zhan, T.; Zhang, P.; Miao, Q. Superpixel-Based Difference Representation Learning for Change Detection in Multispectral. IEEE Trans. Geosci. Remote Sens. 2017 , 55 , 2658–2673. [ Google Scholar ] [ CrossRef ]
  • Asokan, A.; Anitha, J. Change detection techniques for remote sensing applications: A survey. Earth Sci. Inform. 2019 , 12 , 143–160. [ Google Scholar ] [ CrossRef ]
  • Gandhi, M.S.P.; Thummalu, N.A.C. Ndvi: Vegetation change detection using remote sensing and gis—A case study of Vellore District. In Proceedings of the 3rd International Conference on Recent Trends in Computing (ICRTC), Delhi, India, 12–13 March 2015; pp. 1199–1210. [ Google Scholar ]
  • Rawat, J.S.; Kumar, M. Monitoring land use/cover change using remote sensing and GIS techniques: A case study of Hawalbagh block, district Almora, Uttarakhand, India. Egypt. J. Remote Sens. Space Sci. 2015 , 18 , 77–84. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Xu, D.; Chen, R.; Xing, X.; Lin, W. Detection of decreasing vegetation cover based on empirical orthogonal function and temporal unmixing analysis. Math. Probl. Eng. 2017 , 2017 . [ Google Scholar ] [ CrossRef ]
  • Jabari, S.; Zhang, Y. RPC-Based Coregistration of VHR Imagery for Urban Change Detection. Photogramm. Eng. Remote Sens. 2016 , 82 , 521–534. [ Google Scholar ] [ CrossRef ]
  • Haque, I.; Basak, R. Land cover change detection using GIS and remote sensing techniques: A spatio-temporal study on Tanguar Haor, Sunamganj, Bangladesh. Egypt. J. Remote Sens. Space Sci. 2017 , 20 , 251–263. [ Google Scholar ] [ CrossRef ]
  • Zhu, Z. ISPRS Journal of Photogrammetry and Remote Sensing Change detection using landsat time series: A review of frequencies, preprocessing, algorithms, and applications. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 370–384. [ Google Scholar ] [ CrossRef ]
  • Verbesselt, J.; Hyndman, R.; Newnham, G.; Culvenor, D. Detecting trend and seasonal changes in satellite image time series. Remote Sens. Environ. 2010 , 114 , 106–115. [ Google Scholar ] [ CrossRef ]
  • Qin, D.; Zhou, X.; Zhou, W.; Huang, G.; Ren, Y.; Horan, B.; He, J.; Kito, N. MSIM: A change detection framework for damage assessment in natural disasters. Expert Syst. Appl. 2018 , 97 , 372–383. [ Google Scholar ] [ CrossRef ]
  • Guttler, F.; Ienco, D.; Nin, J.; Teisseire, M.; Poncelet, P. A graph-based approach to detect spatiotemporal dynamics in satellite image time series. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 92–107. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Genitha, C.H.; Rajesh, R.K. A technique for multi-spectral satellite image compression using EZW algorithm. In Proceedings of the 2016 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kumaracoil, India, 16–17 December 2016; Volume 1, pp. 689–694. [ Google Scholar ]
  • Sahnoun, K.; Benabadji, N. Satellite image compression algorithm based on the FFT. Int. J. Multimed. Its Appl. 2014 , 6 , 301–310. [ Google Scholar ] [ CrossRef ]
  • Liao, L.; Xiao, J.; Li, Y.; Wang, M.; Hu, R. Learned representation of satellite image series for data compression. Remote Sens. 2020 , 12 , 497. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shihab, H.S.; Shafie, S.; Ramli, A.R.; Ahmad, F. Enhancement of Satellite Image Compression Using a Hybrid (DWT–DCT) Algorithm. Sens. Imaging 2017 , 18 , 30. [ Google Scholar ] [ CrossRef ]
  • Hagag, A.; Hassan, E.S.; Amin, M.; Abd El-Samie, F.E.; Fan, X. Satellite multispectral image compression based on removing sub-bands. Optik 2017 , 131 , 1023–1035. [ Google Scholar ] [ CrossRef ]
  • Shi, C.; Zhang, J.; Zhang, Y. Content-Based Onboard Compression for Remote Sensing Images. Neurocomputing 2016 , 191 , 330–340. [ Google Scholar ] [ CrossRef ]
  • Marchetti, Y.; Nguyen, H.; Braverman, A.; Cressie, N. Spatial data compression via adaptive dispersion clustering. Comput. Stat. Data Anal. 2018 , 117 , 138–153. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kong, W.; Wu, J.; Hu, Z.; Anisetti, M.; Damiani, E.; Jeon, G. Lossless compression for aurora spectral images using fast online bi-dimensional decorrelation method. Inf. Sci. 2017 , 381 , 33–45. [ Google Scholar ] [ CrossRef ]
  • Xiao, B.; Lu, G.; Zhang, Y.; Li, W.; Wang, G. Lossless image compression based on integer Discrete Tchebichef Transform. Neurocomputing 2016 , 214 , 587–593. [ Google Scholar ] [ CrossRef ]
  • Medjahed, S.A.; Ait Saadi, T.; Benyettou, A.; Ouali, M. A new post-classification and band selection frameworks for hyperspectral image classification. Egypt. J. Remote Sens. Space Sci. 2016 , 19 , 163–173. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ming, D.; Zhou, T.; Wang, M.; Tan, T. Land cover classification using random forest with genetic algorithm-based parameter optimization. J. Appl. Remote Sens. 2016 , 10 , 035021. [ Google Scholar ] [ CrossRef ]
  • Thakur, N.; Maheshwari, D. A review of image classification techniques. Int. Res. J. Eng. Technol. 2017 , 4 , 1588–1591. [ Google Scholar ]
  • Tian, S.; Zhang, X.; Tian, J.; Sun, Q. Random forest classification of wetland landcovers from multi-sensor data in the arid region of Xinjiang, China. Remote Sens. 2016 , 8 , 954. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Villa, A.; Chanussot, J.; Benediktsson, J.A.; Jutten, C. Spectral unmixing for the classification of hyperspectral images at a finer spatial resolution. IEEE J. Sel. Top. Signal. Process. 2011 , 5 , 521–533. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Baghbaderani, R.K.; Wang, F.; Stutts, C.; Qu, Y.; Qi, H. Hybrid Spectral Unmixing in Land-Cover Classification. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3009–3012. [ Google Scholar ]
  • Degerickx, J.; Roberts, D.A.; Somers, B. Enhancing the performance of Multiple Endmember Spectral Mixture Analysis (MESMA) for urban land cover mapping using airborne lidar data and band selection. Remote Sens. Environ. 2019 , 221 , 260–273. [ Google Scholar ] [ CrossRef ]
  • Singh, A.; Singh, K.K. Satellite image classification using Genetic Algorithm trained radial basis function neural network, application to the detection of flooded areas. J. Vis. Commun. Image Represent. 2017 , 42 , 173–181. [ Google Scholar ] [ CrossRef ]
  • Ma, L.; Li, M.; Ma, X.; Cheng, L.; Du, P.; Liu, Y. A review of supervised object-based land-cover image classification. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 277–293. [ Google Scholar ] [ CrossRef ]
  • Liu, Q.; Hang, R.; Song, H.; Li, Z. Learning multiscale deep features for high-resolution satellite image scene classification. IEEE Trans. Geosci. Remote Sens. 2018 , 56 , 117–126. [ Google Scholar ] [ CrossRef ]
  • Alonso-Montesinos, J.; Martínez-Durbán, M.; del Sagrado, J.; del Águila, I.M.; Batlles, F.J. The application of Bayesian network classifiers to cloud classification in satellite images. Renew. Energy 2016 , 97 , 155–161. [ Google Scholar ] [ CrossRef ]
  • Feizizadeh, B. A novel approach of fuzzy Dempster-Shafer theory for spatial uncertainty analysis and accuracy assessment of object-based image classification. IEEE Geosci. Remote Sens. Lett. 2018 , 15 , 18–22. [ Google Scholar ] [ CrossRef ]
  • Dhivya, R.; Prakash, R. Edge detection of satellite image using fuzzy logic. Cluster Comput. 2019 , 22 , 11891–11898. [ Google Scholar ] [ CrossRef ]
  • Persello, C.; Stein, A. Deep Fully Convolutional Networks for the Detection of Informal Settlements in VHR Images. IEEE Geosci. Remote Sens. Lett. 2017 , 14 , 2325–2329. [ Google Scholar ] [ CrossRef ]
  • Lin, C.H.; Lin, B.Y.; Lee, K.Y.; Chen, Y.C. Radiometric normalization and cloud detection of optical satellite images using invariant pixels. ISPRS J. Photogramm. Remote Sens. 2015 , 106 , 107–117. [ Google Scholar ] [ CrossRef ]
  • Liasis, G.; Stavrou, S. Satellite images analysis for shadow detection and building height estimation. ISPRS J. Photogramm. Remote Sens. 2016 , 119 , 437–450. [ Google Scholar ] [ CrossRef ]
  • Wu, T.; Hu, X.; Zhang, Y.; Zhang, L.; Tao, P.; Lu, L. Automatic cloud detection for high resolution satellite stereo images and its application in terrain extraction. ISPRS J. Photogramm. Remote Sens. 2016 , 121 , 143–156. [ Google Scholar ] [ CrossRef ]
  • Wang, M.; Fan, C.; Pan, J.; Jin, S.; Chang, X. Image jitter detection and compensation using a high-frequency angular displacement method for Yaogan-26 remote sensing satellite. ISPRS J. Photogramm. Remote Sens. 2017 , 130 , 32–43. [ Google Scholar ] [ CrossRef ]
  • Mera, D.; Bolon-Canedo, V.; Cotos, J.M.; Alonso-Betanzos, A. On the use of feature selection to improve the detection of sea oil spills in SAR images. Comput. Geosci. 2017 , 100 , 166–178. [ Google Scholar ] [ CrossRef ]
  • Cao, L.; Wang, C.; Li, J. Vehicle detection from highway satellite images via transfer learning. Inf. Sci. 2016 , 366 , 177–187. [ Google Scholar ] [ CrossRef ]
  • Santra, A.K.; Christy, C.J. Genetic Algorithm and Confusion Matrix for Document Clustering. Int. J. Comput. Sci. Issues 2012 , 9 , 322–328. [ Google Scholar ]
  • Novaković, J.D.; Veljović, A.; Ilić, S.S.; Papić, Z.; Milica, T. Evaluation of Classification Models in Machine Learning. Theory Appl. Math. Comput. Sci. 2017 , 7 , 39–46. [ Google Scholar ]
  • Foody, G. Local characterization of thematic classification accuracy through spatially constrained confusion matrices. Int. J. Remote 2005 , 26 , 1217–1228. [ Google Scholar ] [ CrossRef ]
  • Andualem, T.G.; Belay, G.; Guadie, A. Land Use Change Detection Using Remote Sensing Technology. J. Earth Sci. Clim. Chang. 2018 , 9 , 1–6. [ Google Scholar ] [ CrossRef ]
  • Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014 , 148 , 42–57. [ Google Scholar ] [ CrossRef ]
  • Ariza-López, F.J.; Rodríguez-Avi, J.; Alba-Fernández, M.V.; García-Balboa, J.L. Thematic accuracy quality control by means of a set of multinomials. Appl. Sci. 2019 , 9 , 4240. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Strahler, A.H.; Boschetti, L.; Foody, G.M.; Friedl, M.A.; Hansen, M.C.; Herold, M.; Mayaux, P.; Morisette, J.T.; Stehman, S.V.; Woodcock, C.E. Global Land Cover Validation: Recommendations for Evaluation and Accuracy Assessment of Global Land Cover Maps ; Technical Report of Institute of Environmental Sustainability; Joint Reasearch Center: Ispra, Italy, 2006; pp. 1–60. [ Google Scholar ]

Click here to enlarge figure

MethodTest ImagesPerformance Metrics
1. Histogram Planting [ ]Phobos Images of aircraftMean, Average Information
2. Modified differential evolution [ ]Images from NASA, Satpalda Geospatial Services and Satellite Imaging Corp.Peak Signal to Noise Ratio (PSNR), Mean Square Error (MSE), Universal Quality Index (UQI), Normalized Absolute Error (NAE), Structural Contrast-Quality Index (SC-QI):
3. Fractional Differential (FD) unsharp masking [ ]Infrared ImagesAverage brightness (B), Average contrast (V), Average discrete information content (or entropy, E), Sharpness (S), Colorfulness (C), Correlation (R), Energy (E) and Homogeneity (M)
MethodTest ImagesFeature ExtractedPerformance Metrics
1. Uniform Competency Feature Extraction [ ]Synthetic Images: ETM+, World View 2, Real Images: SPOT4, SPOT6, Quick Bird, GeoEye1, ASTERRotation and scale invariant Local featuresRecall, Precision, Root Mean Square Error
2. RepTree, Machine Learning and Euclidean distance [ ]Envisat imagesContinues features such as roads, riversEfficiency and processing time
3. Multi image saliency analysis [ ]SPOT 5 imagesROI extraction such as cloudsRecall = 0.8, precision = 0.75,
F-measure = 0.75
4. Digital Surface Models [ ]ZY-3 satellite imagesPixel and feature level extraction of urban scenesOverall accuracy = 96%,
kappa coefficient = 0.862
5. Reversible jump Markov chain Monte Carlo sampler [ ]Synthetic imageExtraction of rivers, channels and roadsCompleteness = 98.8, correctness = 94% and quality = 92.9%
MethodTest ImagesPerformance Metrics
1. Cuckoo Search, McCulloch’s method [ ]Pleiades satellite imagesPSNR = 21.9833, MSE = 411.8573,
FSIM = 0.9595
2. Markov Random Filed method [ ]QuickBird bands with four spectral bandsF-measure = 0.7427, Precision = 0.7170,
Recall = 0.7704
3. Deep convolutional Neural Network [ ]IR imagesAccuracy = 89.9%, Confusion matrix
4. Levy flight firefly algorithm [ ]Multiband satellite images from NASAFSIM = 0.66278, PSNR = 13.7037,
MSE = 3376.1616, SSIM = 0.92448
5. Graph based segmentation, Gabor filter [ ]Quickbird satellite imagesCompleteness = 92.5%, Correctness = 91%, Quality = 84.7%
MethodImage DatasetPerformance Metrics
Mutual InformationQ Visual Information Fidelity
1. Image cartoon texture decomposition and sparse coding [ ]NIR and visible IKONOS images------0.3570.3470
2. Contourlet Transform [ ]SAR and panchromatic images9.342-----------
3. Expectation Maximization algorithm [ ]Landsat image set----0.589----
4. Dictionary learning method [ ]Visible and IR image set5.7780.6600.578
MethodTest ImagesPerformance Metrics
1. DWT-DCT [ ]Landsat 8 imagesPSNR, Compression Ratio
2. DWT, entropy encoder [ ]ETM+ satellite multispectral imagesPSNR, Spectral Angle Mapper(SAM)
3. Content based compression [ ]SAR imagePSNR, Mean Absolute Error(MAE)
4. Adaptive dispersion clustering based compression [ ]AMSR-2 datasetMean Prediction Error Variance Ratio(MPEVR), Root Mean Square Prediction Error(RMSPE)
MethodTest ImagesApplicationPerformance Metrics
1. Fuzzy logic based detectionHyperspectral image [ ]Edge detectionPSNR, MSE
2. Deep Convolutional Neural Network [ ]QuickBird VHR imageInformal settlement detectionOverall accuracy
3. Invariant pixel detection, PCA [ ]Landsat 7 ETM+ imagesCloud detectionAccuracy and F-measure
4. Active contour model [ ]Google Earth imagesShadow detection and height estimationPrecision and recall
Actual Class
WaterForestAgricultureGrassUrbanBarelandTotal
Water600000060
Forest020000020
Agriculture002000020
Grass30010290100340
Urban000090090
Bareland1020605050750940
Total10040903401507501470

Share and Cite

Asokan, A.; Anitha, J.; Ciobanu, M.; Gabor, A.; Naaji, A.; Hemanth, D.J. Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview. Appl. Sci. 2020 , 10 , 4207. https://doi.org/10.3390/app10124207

Asokan A, Anitha J, Ciobanu M, Gabor A, Naaji A, Hemanth DJ. Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview. Applied Sciences . 2020; 10(12):4207. https://doi.org/10.3390/app10124207

Asokan, Anju, J. Anitha, Monica Ciobanu, Andrei Gabor, Antoanela Naaji, and D. Jude Hemanth. 2020. "Image Processing Techniques for Analysis of Satellite Images for Historical Maps Classification—An Overview" Applied Sciences 10, no. 12: 4207. https://doi.org/10.3390/app10124207

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Techniques for deep learning with satellite & aerial imagery

satellite-image-deep-learning/techniques

Folders and files.

NameName
1,330 Commits

Repository files navigation

satellite image processing thesis titles

👉 satellite-image-deep-learning.com 👈

Introduction.

Deep learning has revolutionized the analysis and interpretation of satellite and aerial imagery, addressing unique challenges such as vast image sizes and a wide array of object classes. This repository provides an exhaustive overview of deep learning techniques specifically tailored for satellite and aerial image processing. It covers a range of architectures, models, and algorithms suited for key tasks like classification, segmentation, and object detection.

How to use this repository: use Command + F (Mac) or CTRL + F (Windows) to search this page for e.g. 'SAM'

Classification

Segmentation, object detection, cloud detection & removal, change detection, time series, crop classification, crop yield & vegetation forecasting, generative networks, autoencoders, dimensionality reduction, image embeddings & similarity search, few & zero shot learning, self-supervised, unsupervised & contrastive learning, large vision & language models (llms & lvms), foundational models.

satellite image processing thesis titles

Classification is a fundamental task in remote sensing data analysis, where the goal is to assign a semantic label to each image, such as 'urban', 'forest', 'agricultural land', etc. The process of assigning labels to an image is known as image-level classification. However, in some cases, a single image might contain multiple different land cover types, such as a forest with a river running through it, or a city with both residential and commercial areas. In these cases, image-level classification becomes more complex and involves assigning multiple labels to a single image. This can be accomplished using a combination of feature extraction and machine learning algorithms to accurately identify the different land cover types. It is important to note that image-level classification should not be confused with pixel-level classification, also known as semantic segmentation. While image-level classification assigns a single label to an entire image, semantic segmentation assigns a label to each individual pixel in an image, resulting in a highly detailed and accurate representation of the land cover types in an image. Read A brief introduction to satellite image classification with neural networks

Land classification on Sentinel 2 data using a simple sklearn cluster algorithm or deep learning CNN

Multi-Label Classification of Satellite Photos of the Amazon Rainforest using keras or FastAI

EuroSat-Satellite-CNN-and-ResNet -> Classifying custom image datasets by creating Convolutional Neural Networks and Residual Networks from scratch with PyTorch

Detecting Informal Settlements from Satellite Imagery using fine-tuning of ResNet-50 classifier with repo

Land-Cover-Classification-using-Sentinel-2-Dataset -> well written Medium article accompanying this repo but using the EuroSAT dataset

Land Cover Classification of Satellite Imagery using Convolutional Neural Networks using Keras and a multi spectral dataset captured over vineyard fields of Salinas Valley, California

Detecting deforestation from satellite images -> using FastAI and ResNet50, with repo fsdl_deforestation_detection

Neural Network for Satellite Data Classification Using Tensorflow in Python -> A step-by-step guide for Landsat 5 multispectral data classification for binary built-up/non-built-up class prediction, with repo

Slums mapping from pretrained CNN network on VHR (Pleiades: 0.5m) and MR (Sentinel: 10m) imagery

Comparing urban environments using satellite imagery and convolutional neural networks -> includes interesting study of the image embedding features extracted for each image on the Urban Atlas dataset

RSI-CB -> A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data. See also Remote-sensing-image-classification

NAIP_PoolDetection -> modelled as an object recognition problem, a CNN is used to identify images as being swimming pools or something else - specifically a street, rooftop, or lawn

Land Use and Land Cover Classification using a ResNet Deep Learning Architecture -> uses fastai and the EuroSAT dataset

Vision Transformers Use Case: Satellite Image Classification without CNNs

WaterNet -> a CNN that identifies water in satellite images

Road-Network-Classification -> Road network classification model using ResNet-34, road classes organic, gridiron, radial and no pattern

Scaling AI to map every school on the planet

Landsat classification CNN tutorial with repo

satellite-crosswalk-classification

Understanding the Amazon Rainforest with Multi-Label Classification + VGG-19, Inceptionv3, AlexNet & Transfer Learning

Implementation of the 3D-CNN model for land cover classification -> uses the Sundarbans dataset, with repo

SSTN -> Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A FAS Framework

SatellitePollutionCNN -> A novel algorithm to predict air pollution levels with state-of-art accuracy using deep learning and GoogleMaps satellite images

PropertyClassification -> Classifying the type of property given Real Estate, satellite and Street view Images

remote-sense-quickstart -> classification on a number of datasets, including with attention visualization

Satellite image classification using multiple machine learning algorithms

satsense -> land use/cover classification using classical features including HoG & NDVI

PyTorch_UCMerced_LandUse

EuroSAT-image-classification

landcover_classification -> using fast.ai on EuroSAT

IGARSS2020_BWMS -> Band-Wise Multi-Scale CNN Architecture for Remote Sensing Image Scene Classification with a novel CNN architecture for the feature embedding of high-dimensional RS images

image.classification.on.EuroSAT -> solution in pure pytorch

hurricane_damage -> Post-hurricane structure damage assessment based on aerial imagery

openai-drivendata-challenge -> Using deep learning to classify the building material of rooftops (aerial imagery from South America)

is-it-abandoned -> Can we tell if a house is abandoned based on aerial LIDAR imagery?

BoulderAreaDetector -> CNN to classify whether a satellite image shows an area would be a good rock climbing spot or not

ISPRS_S2FL -> Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model

Brazilian-Coffee-Detection -> uses Keras with public dataset

tf-crash-severity -> predict the crash severity for given road features contained within satellite images

ensemble_LCLU -> Deep neural network ensembles for remote sensing land cover and land use classification

cerraNet -> contextually classify the types of use and coverage in the Brazilian Cerrado

Urban-Analysis-Using-Satellite-Imagery -> classify urban area as planned or unplanned using a combination of segmentation and classification

ChipClassification -> Deep learning for multi-modal classification of cloud, shadow and land cover scenes in PlanetScope and Sentinel-2 imagery

DeeplearningClassficationLandsat-tImages -> Water/Ice/Land Classification Using Large-Scale Medium Resolution Landsat Satellite Images

wildfire-detection-from-satellite-images-ml -> detect whether an image contains a wildfire, with example flask web app

mining-discovery-with-deep-learning -> Mining and Tailings Dam Detection in Satellite Imagery Using Deep Learning

e-Farmerce-platform -> classify crop type

sentinel2-deep-learning -> Novel Training Methodologies for Land Classification of Sentinel-2 Imagery

RSSC-transfer -> The Role of Pre-Training in High-Resolution Remote Sensing Scene Classification

Classifying Geo-Referenced Photos and Satellite Images for Supporting Terrain Classification -> detect floods

Pay-More-Attention -> Remote Sensing Image Scene Classification Based on an Enhanced Attention Module

Remote Sensing Image Classification via Improved Cross-Entropy Loss and Transfer Learning Strategy Based on Deep Convolutional Neural Networks

DenseNet40-for-HRRSISC -> DenseNet40 for remote sensing image scene classification, uses UC Merced Dataset

SKAL -> Looking Closer at the Scene: Multiscale Representation Learning for Remote Sensing Image Scene Classification

potsdam-tensorflow-practice -> image classification of Potsdam dataset using tensorflow

SAFF -> Self-Attention-Based Deep Feature Fusion for Remote Sensing Scene Classification

GLNET -> Convolutional Neural Networks Based Remote Sensing Scene Classification under Clear and Cloudy Environments

Remote-sensing-image-classification -> transfer learning using pytorch to classify remote sensing data into three classes: aircrafts, ships, none

remote_sensing_pretrained_models -> as an alternative to fine tuning on models pretrained on ImageNet, here some CNN are pretrained on the RSD46-WHU & AID datasets

CNN_AircraftDetection -> CNN for aircraft detection in satellite images using keras

OBIC-GCN -> Object-based Classification Framework of Remote Sensing Images with Graph Convolutional Networks

aitlas-arena -> An open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification in Earth Observation (EO)

droughtwatch -> Satellite-based Prediction of Forage Conditions for Livestock in Northern Kenya

JSTARS_2020_DPN-HRA -> Deep Prototypical Networks With Hybrid Residual Attention for Hyperspectral Image Classification

SIGNA -> Semantic Interleaving Global Channel Attention for Multilabel Remote Sensing Image Classification

Satellite Image Classification using rmldnn and Sentinel 2 data

PBDL -> Patch-Based Discriminative Learning for Remote Sensing Scene Classification

EmergencyNet -> identify fire and other emergencies from a drone

satellite-deforestation -> Using Satellite Imagery to Identify the Leading Indicators of Deforestation, applied to the Kaggle Challenge Understanding the Amazon from Space

RSMLC -> Deep Network Architectures as Feature Extractors for Multi-Label Classification of Remote Sensing Images

FireRisk -> A Remote Sensing Dataset for Fire Risk Assessment with Benchmarks Using Supervised and Self-supervised Learning

flood_susceptibility_mapping -> Towards urban flood susceptibility mapping using data-driven models in Berlin, Germany

tick-tick-bloom -> Winners of the Tick Tick Bloom: Harmful Algal Bloom Detection Challenge. Task was to predict severity of algae bloom, winners used decision trees

Estimating coal power plant operation from satellite images with computer vision -> use Sentinel 2 data to identify if a coal power plant is on or off, with dataset and repo

Building-detection-and-roof-type-recognition -> A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image

Performance Comparison of Multispectral Channels for Land Use Classification -> Implemented ResNet-50, ResNet-101, ResNet-152, Vision Transformer on RGB and multispectral versions of EuroSAT dataset.

SNN4Space -> project which investigates the feasibility of deploying spiking neural networks (SNN) in land cover and land use classification tasks

vessel-classification -> classify vessels and identify fishing behavior based on AIS data

RSMamba -> Remote Sensing Image Classification with State Space Model

BirdSAT -> Cross-View Contrastive Masked Autoencoders for Bird Species Classification and Mapping

EGNNA_WND -> Estimating the presence of the West Nile Disease employing Graph Neural network

cyfi -> Estimate cyanobacteria density based on Sentinel-2 satellite imagery

satellite image processing thesis titles

Image segmentation is a crucial step in image analysis and computer vision, with the goal of dividing an image into semantically meaningful segments or regions. The process of image segmentation assigns a class label to each pixel in an image, effectively transforming an image from a 2D grid of pixels into a 2D grid of pixels with assigned class labels. One common application of image segmentation is road or building segmentation, where the goal is to identify and separate roads and buildings from other features within an image. To accomplish this task, single class models are often trained to differentiate between roads and background, or buildings and background. These models are designed to recognize specific features, such as color, texture, and shape, that are characteristic of roads or buildings, and use this information to assign class labels to the pixels in an image. Another common application of image segmentation is land use or crop type classification, where the goal is to identify and map different land cover types within an image. In this case, multi-class models are typically used to recognize and differentiate between multiple classes within an image, such as forests, urban areas, and agricultural land. These models are capable of recognizing complex relationships between different land cover types, allowing for a more comprehensive understanding of the image content. Read A brief introduction to satellite image segmentation with neural networks . Note that many articles which refer to 'hyperspectral land classification' are often actually describing semantic segmentation. Image source

Segmentation - Land use & land cover

U-Net for Semantic Segmentation on Unbalanced Aerial Imagery -> using the Dubai dataset

Semantic Segmentation of Dubai dataset Using a TensorFlow U-Net Model

nga-deep-learning -> performs semantic segmentation on high resultion GeoTIF data using a modified U-Net & Keras, published by NASA researchers

Automatic Detection of Landfill Using Deep Learning

SpectralNET -> a 2D wavelet CNN for Hyperspectral Image Classification, uses Salinas Scene dataset & Keras

laika -> The goal of this repo is to research potential sources of satellite image data and to implement various algorithms for satellite image segmentation

PEARL -> a human-in-the-loop AI tool to drastically reduce the time required to produce an accurate Land Use/Land Cover (LULC) map, blog post , uses Microsoft Planetary Computer and ML models run locally in the browser. Code for backelnd and frontend

Land Cover Classification with U-Net -> Satellite Image Multi-Class Semantic Segmentation Task with PyTorch Implementation of U-Net, uses DeepGlobe Land Cover Segmentation dataset, with code

Multi-class semantic segmentation of satellite images using U-Net using DSTL dataset, tensorflow 1 & python 2.7. Accompanying article

Codebase for multi class land cover classification with U-Net accompanying a masters thesis, uses Keras

dubai-satellite-imagery-segmentation -> due to the small dataset, image augmentation was used

CDL-Segmentation -> Deep Learning Based Land Cover and Crop Type Classification: A Comparative Study. Compares UNet, SegNet & DeepLabv3+

LoveDA -> A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation

Satellite Imagery Semantic Segmentation with CNN -> 7 different segmentation classes, DeepGlobe Land Cover Classification Challenge dataset, with repo

Aerial Semantic Segmentation using U-Net Deep Learning Model medium article, with repo

UNet-Satellite-Image-Segmentation -> A Tensorflow implentation of light UNet semantic segmentation framework

DeepGlobe Land Cover Classification Challenge solution

Semantic-segmentation-with-PyTorch-Satellite-Imagery -> predict 25 classes on RGB imagery taken to assess the damage after Hurricane Harvey

Semantic Segmentation With Sentinel-2 Imagery -> uses LandCoverNet dataset and fast.ai

CNN_Enhanced_GCN -> CNN-Enhanced Graph Convolutional Network With Pixel- and Superpixel-Level Feature Fusion for Hyperspectral Image Classification

LULCMapping-WV3images-CORINE-DLMethods -> Land Use and Land Cover Mapping Using Deep Learning Based Segmentation Approaches and VHR Worldview-3 Images

MCANet -> A joint semantic segmentation framework of optical and SAR images for land use classification. Uses WHU-OPT-SAR-dataset

land-cover -> Model Generalization in Deep Learning Applications for Land Cover Mapping

generalizablersc -> Cross-dataset Learning for Generalizable Land Use Scene Classification

Large-scale-Automatic-Identification-of-Urban-Vacant-Land -> Large-scale automatic identification of urban vacant land using semantic segmentation of high-resolution remote sensing images

SSLTransformerRS -> Self-supervised Vision Transformers for Land-cover Segmentation and Classification

aerial-tile-segmentation -> Large satellite image semantic segmentation into 6 classes using Tensorflow 2.0 and ISPRS benchmark dataset

DCSA-Net -> Dynamic Convolution Self-Attention Network for Land-Cover Classification in VHR Remote-Sensing Images

CHeGCN-CNN_enhanced_Heterogeneous_Graph -> CNN-Enhanced Heterogeneous Graph Convolutional Network: Inferring Land Use from Land Cover with a Case Study of Park Segmentation

TCSVT_2022_DGSSC -> DGSSC: A Deep Generative Spectral-Spatial Classifier for Imbalanced Hyperspectral Imagery

DeepForest-Wetland-Paper -> Deep Forest classifier for wetland mapping using the combination of Sentinel-1 and Sentinel-2 data, GIScience & Remote Sensing

Wetland_UNet -> UNet models that can delineate wetlands using remote sensing data input including bands from Sentinel-2 LiDAR and geomorphons. By the Conservation Innovation Center of Chesapeake Conservancy and Defenders of Wildlife

DPA -> DPA is an unsupervised domain adaptation (UDA) method applied to different satellite images for larg-scale land cover mapping.

dynamicworld -> Dynamic World, Near real-time global 10 m land use land cover mapping

spada -> Land Cover Segmentation with Sparse Annotations from Sentinel-2 Imagery

M3SPADA -> Multi-Sensor Temporal Unsupervised Domain Adaptation for Land Cover Mapping with spatial pseudo labelling and adversarial learning

GLNet -> Collaborative Global-Local Networks for Memory-Efficient Segmentation of Ultra-High Resolution Images

LoveNAS -> LoveNAS: Towards Multi-Scene Land-Cover Mapping via Hierarchical Searching Adaptive Network

FLAIR-2 challenge -> Semantic segmentation and domain adaptation challenge proposed by the French National Institute of Geographical and Forest Information (IGN)

flair-2 8th place solution

Segmentation - Water, coastlines, rivers & floods

pytorch-waterbody-segmentation -> UNET model trained on the Satellite Images of Water Bodies dataset from Kaggle. The model is deployed on Hugging Face Spaces

Flood Detection and Analysis using UNET with Resnet-34 as the back bone uses fastai

Automatic Flood Detection from Satellite Images Using Deep Learning

UNSOAT used fastai to train a Unet to perform semantic segmentation on satellite imageries to detect water

Semi-Supervised Classification and Segmentation on High Resolution Aerial Images - Solving the FloodNet problem

Houston_flooding -> labeling each pixel as either flooded or not using data from Hurricane Harvey. Dataset consisted of pre and post flood images, and a ground truth floodwater mask was created using unsupervised clustering (with DBScan) of image pixels with human cluster verification/adjustment

ml4floods -> An ecosystem of data, models and code pipelines to tackle flooding with ML

A comprehensive guide to getting started with the ETCI Flood Detection competition -> using Sentinel1 SAR & pytorch

Map Floodwater of SAR Imagery with SageMaker -> applied to Sentinel-1 dataset

1st place solution for STAC Overflow: Map Floodwater from Radar Imagery hosted by Microsoft AI for Earth -> combines Unet with Catboostclassifier, taking their maxima, not the average

hydra-floods -> an open source Python application for downloading, processing, and delivering surface water maps derived from remote sensing data

CoastSat -> tool for mapping coastlines which has an extension CoastSeg using segmentation models

Satellite_Flood_Segmentation_of_Harvey -> explores both deep learning and traditional kmeans

Flood Event Detection Utilizing Satellite Images

ETCI-2021-Competition-on-Flood-Detection -> Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

FDSI -> Flood Detection in Satellite Images - 2017 Multimedia Satellite Task

deepwatermap -> a deep model that segments water on multispectral images

rivamap -> an automated river analysis and mapping engine

deep-water -> track changes in water level

WatNet -> A deep ConvNet for surface water mapping based on Sentinel-2 image, uses the Earth Surface Water Dataset

A-U-Net-for-Flood-Extent-Mapping

floatingobjects -> TOWARDS DETECTING FLOATING OBJECTS ON A GLOBAL SCALE WITHLEARNED SPATIAL FEATURES USING SENTINEL 2. Uses U-Net & pytorch

SpaceNet8 -> baseline Unet solution to detect flooded roads and buildings

dlsim -> Breaking the Limits of Remote Sensing by Simulation and Deep Learning for Flood and Debris Flow Mapping

Water-HRNet -> HRNet trained on Sentinel 2

semantic segmentation model to identify newly developed or flooded land using NAIP imagery provided by the Chesapeake Conservancy, training on MS Azure

BandNet -> Analysis and application of multispectral data for water segmentation using machine learning. Uses Sentinel-2 data

mmflood -> MMFlood: A Multimodal Dataset for Flood Delineation From Satellite Imagery (Sentinel 1 SAR)

Urban_flooding -> Towards transferable data-driven models to predict urban pluvial flood water depth in Berlin, Germany

Flood-Mapping-Using-Satellite-Images -> masters thesis comparing Random Forest & Unet

MECNet -> Rich CNN features for water-body segmentation from very high resolution aerial and satellite imagery

SWRNET -> A Deep Learning Approach for Small Surface Water Area Recognition Onboard Satellite

elwha-segmentation -> fine-tuning Meta's Segment Anything (SAM) for bird's eye view river pixel segmentation, with Medium article

RiverSnap -> code for paper: A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery

Segmentation - Fire, smoke & burn areas

SatelliteVu-AWS-Disaster-Response-Hackathon -> fire spread prediction using classical ML & deep learning

Wild Fire Detection using U-Net trained on Databricks & Keras, semantic segmentation

A Practical Method for High-Resolution Burned Area Monitoring Using Sentinel-2 and VIIRS

AI Geospatial Wildfire Risk Prediction -> A predictive model using geospatial raster data to asses wildfire hazard potential over the contiguous United States using Unet

IndustrialSmokePlumeDetection -> using Sentinel-2 & a modified ResNet-50

burned-area-detection -> uses Sentinel-2

rescue -> Attention to fires: multi-channel deep-learning models forwildfire severity prediction

smoke_segmentation -> Segmenting smoke plumes and predicting density from GOES imagery

wildfire-detection -> Using Vision Transformers for enhanced wildfire detection in satellite images

Burned_Area_Detection -> Detecting Burned Areas with Sentinel-2 data

burned-area-baseline -> baseline unet model accompanying the Satellite Burned Area Dataset (Sentinel 1 & 2)

burned-area-seg -> Burned area segmentation from Sentinel-2 using multi-task learning

chabud2023 -> Change detection for Burned area Delineation (ChaBuD) ECML/PKDD 2023 challenge

Post Wildfire Burnt-up Detection using Siamese-UNet -> on Chadbud dataset

vit-burned-detection -> Vision transformers in burned area delineation

Segmentation - Landslides

landslide-sar-unet -> Deep Learning for Rapid Landslide Detection using Synthetic Aperture Radar (SAR) Datacubes

landslide-mapping-with-cnn -> A new strategy to map landslides with a generalized convolutional neural network

Relict_landslides_CNN_kmeans -> Relict landslide detection in rainforest areas using a combination of k-means clustering algorithm and Deep-Learning semantic segmentation models

Landslide-mapping-on-SAR-data-by-Attention-U-Net -> Rapid Mapping of landslide on SAR data by Attention U-net

SAR-landslide-detection-pretraining -> SAR-based landslide classification pretraining leads to better segmentation

Landslide mapping from Sentinel-2 imagery through change detection

Segmentation - Glaciers

HED-UNet -> a model for simultaneous semantic segmentation and edge detection, examples provided are glacier fronts and building footprints using the Inria Aerial Image Labeling dataset

glacier_mapping -> Mapping glaciers in the Hindu Kush Himalaya, Landsat 7 images, Shapefile labels of the glaciers, Unet with dropout

glacier-detect-ML -> a simple logistic regression model to identify a glacier in Landsat satellite imagery

GlacierSemanticSegmentation

Antarctic-fracture-detection -> uses UNet with the MODIS Mosaic of Antarctica to detect surface fractures

Segmentation - Other environmental

Detection of Open Landfills -> uses Sentinel-2 to detect large changes in the Normalized Burn Ratio (NBR)

sea_ice_remote_sensing -> Sea Ice Concentration classification

Methane-detection-from-hyperspectral-imagery -> Deep Remote Sensing Methods for Methane Detection in Overhead Hyperspectral Imagery

methane-emission-project -> Classification CNNs was combined in an ensemble approach with traditional methods on tabular data

CH4Net -> A fast, simple model for detection of methane plumes using sentinel-2

EddyNet -> A Deep Neural Network For Pixel-Wise Classification of Oceanic Eddies

schisto-vegetation -> Deep Learning Segmentation of Satellite Imagery Identifies Aquatic Vegetation Associated with Snail Intermediate Hosts of Schistosomiasis in Senegal, Africa

Earthformer -> Exploring space-time transformers for earth system forecasting

weather4cast-2022 -> Unet-3D baseline model for Weather4cast Rain Movie Prediction competition

WeatherFusionNet -> Predicting Precipitation from Satellite Data. weather4cast-2022 1st place solution

marinedebrisdetector -> Large-scale Detection of Marine Debris in Coastal Areas with Sentinel-2

kaggle-identify-contrails-4th -> 4th place Solution, Google Research - Identify Contrails to Reduce Global Warming

MineSegSAT -> An automated system to evaluate mining disturbed area extents from Sentinel-2 imagery

STARCOP: Semantic Segmentation of Methane Plumes with Hyperspectral Machine Learning models

asos -> Recognizing protected and anthropogenic patterns in landscapes using interpretable machine learning and satellite imagery

Segmentation - Roads & sidewalks

Extracting roads is challenging due to the occlusions caused by other objects and the complex traffic environment

ChesapeakeRSC -> segmentation to extract roads from the background but are additionally evaluated by how they perform on the "Tree Canopy Over Road" class

Road detection using semantic segmentation and albumentations for data augmention using the Massachusetts Roads Dataset, U-net & Keras. With code

ML_EPFL_Project_2 -> U-Net in Pytorch to perform semantic segmentation of roads on satellite images

Semantic Segmentation of roads using U-net Keras, OSM data, project summary article by student, no code

Winning Solutions from SpaceNet Road Detection and Routing Challenge

RoadVecNet -> Road-Network-Segmentation-and-Vectorization in keras with dataset

Detecting road and road types jupyter notebook

awesome-deep-map -> A curated list of resources dedicated to deep learning / computer vision algorithms for mapping. The mapping problems include road network inference, building footprint extraction, etc.

RoadTracer: Automatic Extraction of Road Networks from Aerial Images -> uses an iterative search process guided by a CNN-based decision function to derive the road network graph directly from the output of the CNN

road_detection_mtl -> Road Detection using a multi-task Learning technique to improve the performance of the road detection task by incorporating prior knowledge constraints, uses the SpaceNet Roads Dataset

road_connectivity -> Improved Road Connectivity by Joint Learning of Orientation and Segmentation (CVPR2019)

Road-Network-Extraction using classical Image processing -> blur & canny edge detection

SPIN_RoadMapper -> Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving

road_extraction_remote_sensing -> pytorch implementation, CVPR2018 DeepGlobe Road Extraction Challenge submission. See also DeepGlobe-Road-Extraction-Challenge

RoadDetections dataset by Microsoft

CoANet -> Connectivity Attention Network for Road Extraction From Satellite Imagery. The CoA module incorporates graphical information to ensure the connectivity of roads are better preserved

Satellite Imagery Road Segmentation -> intro articule on Medium using the kaggle Massachusetts Roads Dataset

Label-Pixels -> for semantic segmentation of roads and other features

Satellite-image-road-extraction -> Road Extraction by Deep Residual U-Net

road_building_extraction -> Pytorch implementation of U-Net architecture for road and building extraction

RCFSNet -> Road Extraction From Satellite Imagery by Road Context and Full-Stage Feature

SGCN -> Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments From High-Resolution Remote-Sensing Images

ASPN -> Road Segmentation for Remote Sensing Images using Adversarial Spatial Pyramid Networks

FCNs-for-road-extraction-keras -> Road extraction of high-resolution remote sensing images based on various semantic segmentation networks

cresi -> Road network extraction from satellite imagery, with speed and travel time estimates

D-LinkNet -> LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction

Sat2Graph -> Road Graph Extraction through Graph-Tensor Encoding

Image-Segmentation) -> using Massachusetts Road dataset and fast.ai

RoadTracer-M -> Road Network Extraction from Satellite Images Using CNN Based Segmentation and Tracing

ScRoadExtractor -> Scribble-based Weakly Supervised Deep Learning for Road Surface Extraction from Remote Sensing Images

RoadDA -> Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images

DeepSegmentor -> A Pytorch implementation of DeepCrack and RoadNet projects

Cascaded Residual Attention Enhanced Road Extraction from Remote Sensing Images

NL-LinkNet -> Toward Lighter but More Accurate Road Extraction with Non-Local Operations

IRSR-net -> Lightweight Remote Sensing Road Detection Network

hironex -> A python tool for automatic, fully unsupervised extraction of historical road networks from historical maps

Road_detection_model -> Mapping Roads in the Brazilian Amazon with Artificial Intelligence and Sentinel-2

DTnet -> Road detection via a dual-task network based on cross-layer graph fusion modules

Automatic-Road-Extraction-from-Historical-Maps-using-Deep-Learning-Techniques -> Automatic Road Extraction from Historical Maps using Deep Learning Techniques

Istanbul_Dataset -> segmentation on the Istanbul, Inria and Massachusetts datasets

Road-Segmentation -> Road segmentation on Satellite Images using CNN (U-Nets and FCN8) and Logistic Regression

D-LinkNet -> 1st place solution in DeepGlobe Road Extraction Challenge

PaRK-Detect -> PaRK-Detect: Towards Efficient Multi-Task Satellite Imagery Road Extraction via Patch-Wise Keypoints Detection

tile2net -> Mapping the walk: A scalable computer vision approach for generating sidewalk network datasets from aerial imagery

AerialLaneNet -> Building Lane-Level Maps from Aerial Images, introduces the AErial Lane (AEL) Dataset: a first large-scale aerial image dataset built for lane detection

sam_road -> Segment Anything Model (SAM) for large-scale, vectorized road network extraction from aerial imagery.

LRDNet -> A Lightweight Road Detection Algorithm Based on Multiscale Convolutional Attention Network and Coupled Decoder Head

Fine–Grained Extraction of Road Networks via Joint Learning of Connectivity and Segmentation -> uses SpaceNet 3 dataset

Segmentation - Buildings & rooftops

Road and Building Semantic Segmentation in Satellite Imagery uses U-Net on the Massachusetts Roads Dataset & keras

find unauthorized constructions using aerial photography -> Dataset creation

SRBuildSeg -> Making low-resolution satellite images reborn: a deep learning approach for super-resolution building extraction

Building footprint detection with fastai on the challenging SpaceNet7 dataset uses U-Net & fastai

Pix2Pix-for-Semantic-Segmentation-of-Satellite-Images -> using Pix2Pix GAN network to segment the building footprint from Satellite Images, uses tensorflow

SpaceNetUnet -> Baseline model is U-net like, applied to SpaceNet Vegas data, using Keras

automated-building-detection -> Input: very-high-resolution (<= 0.5 m/pixel) RGB satellite images. Output: buildings in vector format (geojson), to be used in digital map products. Built on top of robosat and robosat.pink.

project_sunroof_india -> Analyzed Google Satellite images to generate a report on individual house rooftop's solar power potential, uses a range of classical computer vision techniques (e.g Canny Edge Detection) to segment the roofs

JointNet-A-Common-Neural-Network-for-Road-and-Building-Extraction

Mapping Africa’s Buildings with Satellite Imagery: Google AI blog post . See the open-buildings dataset

nz_convnet -> A U-net based ConvNet for New Zealand imagery to classify building outlines

polycnn -> End-to-End Learning of Polygons for Remote Sensing Image Classification

spacenet_building_detection solution by motokimura using Unet

Vec2Instance -> applied to the SpaceNet challenge AOI 2 (Vegas) building footprint dataset, tensorflow v1.12

EarthquakeDamageDetection -> Buildings segmentation from satellite imagery and damage classification for each build, using Keras

Semantic-segmentation repo by fuweifu-vtoo -> uses pytorch and the Massachusetts Buildings & Roads Datasets

Extracting buildings and roads from AWS Open Data using Amazon SageMaker -> With repo

TF-SegNet -> AirNet is a segmentation network based on SegNet, but with some modifications

rgb-footprint-extract -> a Semantic Segmentation Network for Urban-Scale Building Footprint Extraction Using RGB Satellite Imagery, DeepLavV3+ module with a Dilated ResNet C42 backbone

SpaceNetExploration -> A sample project demonstrating how to extract building footprints from satellite images using a semantic segmentation model. Data from the SpaceNet Challenge

Rooftop-Instance-Segmentation -> VGG-16, Instance Segmentation, uses the Airs dataset

solar-farms-mapping -> An Artificial Intelligence Dataset for Solar Energy Locations in India

poultry-cafos -> This repo contains code for detecting poultry barns from high-resolution aerial imagery and an accompanying dataset of predicted barns over the United States

ssai-cnn -> This is an implementation of Volodymyr Mnih's dissertation methods on his Massachusetts road & building dataset

Remote-sensing-building-extraction-to-3D-model-using-Paddle-and-Grasshopper

segmentation-enhanced-resunet -> Urban building extraction in Daejeon region using Modified Residual U-Net (Modified ResUnet) and applying post-processing

Mask RCNN for Spacenet Off Nadir Building Detection

GRSL_BFE_MA -> Deep Learning-based Building Footprint Extraction with Missing Annotations using a novel loss function

FER-CNN -> Detection, Classification and Boundary Regularization of Buildings in Satellite Imagery Using Faster Edge Region Convolutional Neural Networks

UNET-Image-Segmentation-Satellite-Picture -> Unet to predict roof tops on Crowed AI Mapping dataset, uses keras

Vector-Map-Generation-from-Aerial-Imagery-using-Deep-Learning-GeoSpatial-UNET -> applied to geo-referenced images which are very large size > 10k x 10k pixels

building-footprint-segmentation -> pip installable library to train building footprint segmentation on satellite and aerial imagery, applied to Massachusetts Buildings Dataset and Inria Aerial Image Labeling Dataset

SemSegBuildings -> Project using fast.ai framework for semantic segmentation on Inria building segmentation dataset

FCNN-example -> overfit to a given single image to detect houses

SAT2LOD2 -> an open-source, python-based GUI-enabled software that takes the satellite images as inputs and returns LoD2 building models as outputs

SatFootprint -> building segmentation on the Spacenet 7 dataset

Building-Detection -> Raster Vision experiment to train a model to detect buildings from satellite imagery in three cities in Latin America

Multi-building-tracker -> Multi-target building tracker for satellite images using deep learning

Boundary Enhancement Semantic Segmentation for Building Extraction

keras code for binary semantic segmentation

Spacenet-Building-Detection

LGPNet-BCD -> Building Change Detection for VHR Remote Sensing Images via Local-Global Pyramid Network and Cross-Task Transfer Learning Strategy

MTL_homoscedastic_SRB -> A Multi-Task Deep Learning Framework for Building Footprint Segmentation

UNet_CNN -> UNet model to segment building coverage in Boston using Remote sensing data, uses keras

FDANet -> Full-Level Domain Adaptation for Building Extraction in Very-High-Resolution Optical Remote-Sensing Images

CBRNet -> A Coarse-to-fine Boundary Refinement Network for Building Extraction from Remote Sensing Imagery

ASLNet -> Adversarial Shape Learning for Building Extraction in VHR Remote Sensing Images

BRRNet -> A Fully Convolutional Neural Network for Automatic Building Extraction From High-Resolution Remote Sensing Images

Multi-Scale-Filtering-Building-Index -> A Multi - Scale Filtering Building Index for Building Extraction in Very High - Resolution Satellite Imagery

Models for Remote Sensing -> long list of unets etc applied to building detection

boundary_loss_for_remote_sensing -> Boundary Loss for Remote Sensing Imagery Semantic Segmentation

Open Cities AI Challenge -> Segmenting Buildings for Disaster Resilience. Winning solutions on Github

MAPNet -> Multi Attending Path Neural Network for Building Footprint Extraction from Remote Sensed Imagery

dual-hrnet -> localizing buildings and classifying their damage level

ESFNet -> Efficient Network for Building Extraction from High-Resolution Aerial Images

rooftop-detection-python -> Detect Rooftops from low resolution satellite images and calculate area for cultivation and solar panel installment using classical computer vision techniques

keras_segmentation_models -> Using Open Vector-Based Spatial Data to Create Semantic Datasets for Building Segmentation for Raster Data

CVCMFFNet -> Complex-Valued Convolutional and Multifeature Fusion Network for Building Semantic Segmentation of InSAR Images

STEB-UNet -> A Swin Transformer-Based Encoding Booster Integrated in U-Shaped Network for Building Extraction

dfc2020_baseline -> Baseline solution for the IEEE GRSS Data Fusion Contest 2020. Predict land cover labels from Sentinel-1 and Sentinel-2 imagery

Fusing multiple segmentation models based on different datasets into a single edge-deployable model -> roof, car & road segmentation

ground-truth-gan-segmentation -> use Pix2Pix to segment the footprint of a building. The dataset used is AIRS

UNICEF-Giga_Sudan -> Detecting school lots from satellite imagery in Southern Sudan using a UNET segmentation model

building_footprint_extraction -> The project retrieves satellite imagery from Google and performs building footprint extraction using a U-Net.

projectRegularization -> Regularization of building boundaries in satellite images using adversarial and regularized losses

PolyWorldPretrainedNetwork -> Polygonal Building Extraction with Graph Neural Networks in Satellite Images

dl_image_segmentation -> Uncertainty-Aware Interpretable Deep Learning for Slum Mapping and Monitoring. Uses SHAP

UBC-dataset -> a dataset for building detection and classification from very high-resolution satellite imagery with the focus on object-level interpretation of individual buildings

UNetFormer -> A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery

BES-Net -> Boundary Enhancing Semantic Context Network for High-Resolution Image Semantic Segmentation. Applied to Vaihingen and Potsdam datasets

CVNet -> Contour Vibration Network for Building Extraction

CFENet -> A Context Feature Enhancement Network for Building Extraction from High-Resolution Remote Sensing Imagery

HiSup -> Accurate Polygonal Mapping of Buildings in Satellite Imagery

BuildingExtraction -> Building Extraction from Remote Sensing Images with Sparse Token Transformers

CrossGeoNet -> A Framework for Building Footprint Generation of Label-Scarce Geographical Regions

AFM_building -> Building Footprint Generation Through Convolutional Neural Networks With Attraction Field Representation

RAMP (Replicable AI for MicroPlanning) -> building detection in low and middle income countries

Building-instance-segmentation -> Multi-Modal Feature Fusion Network with Adaptive Center Point Detector for Building Instance Extraction

CGSANet -> A Contour-Guided and Local Structure-Aware Encoder–Decoder Network for Accurate Building Extraction From Very High-Resolution Remote Sensing Imagery

building-footprints-update -> Learning Color Distributions from Bitemporal Remote Sensing Images to Update Existing Building Footprints

RAMP -> model and buildings dataset to support a wide variety of humanitarian use cases

Thesis_Semantic_Image_Segmentation_on_Satellite_Imagery_using_UNets -> This master thesis aims to perform semantic segmentation of buildings on satellite images from the SpaceNet challenge 1 dataset using the U-Net architecture

HD-Net -> High-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition

RoofSense -> A novel deep learning solution for the automatic roofing material classification of the Dutch building stock using aerial imagery and laser scanning data fusion

IBS-AQSNet -> Enhanced Automated Quality Assessment Network for Interactive Building Segmentation in High-Resolution Remote Sensing Imagery

DeepMAO -> Deep Multi-scale Aware Overcomplete Network for Building Segmentation in Satellite Imagery

Segmentation - Solar panels

Deep-Learning-for-Solar-Panel-Recognition -> using both object detection with Yolov5 and Unet segmentation

DeepSolar -> A Machine Learning Framework to Efficiently Construct a Solar Deployment Database in the United States. Dataset on kaggle , actually used a CNN for classification and segmentation is obtained by applying a threshold to the activation map. Original code is tf1 but tf2/kers and a pytorch implementation are available. Also checkout Visualizations and in-depth analysis .. of the factors that can explain the adoption of solar energy in .. Virginia and DeepSolar tracker: towards unsupervised assessment with open-source data of the accuracy of deep learning-based distributed PV mapping

hyperion_solar_net -> trained classificaton & segmentation models on RGB imagery from Google Maps

3D-PV-Locator -> Large-scale detection of rooftop-mounted photovoltaic systems in 3D

PV_Pipeline -> DeepSolar for Germany

solar-panels-detection -> using SegNet, Fast SCNN & ResNet

predict_pv_yield -> Using optical flow & machine learning to predict PV yield

Large-scale-solar-plant-monitoring -> Remote Sensing for Monitoring of Photovoltaic Power Plants in Brazil Using Deep Semantic Segmentation

Panel-Segmentation -> Determine the presence of a solar array in the satellite image (boolean True/False), using a VGG16 classification model

Roofpedia -> an open registry of green roofs and solar roofs across the globe identified by Roofpedia through deep learning

Predicting the Solar Potential of Rooftops using Image Segmentation and Structured Data Medium article, using 20cm imagery & Unet

solar-pv-global-inventory

remote-sensing-solar-pv -> A repository for sharing progress on the automated detection of solar PV arrays in sentinel-2 remote sensing imagery

solar-panel-segmentation) -> Finding solar panels using USGS satellite imagery

solar_seg -> Solar segmentation of PV modules (sub elements of panels) using drone images and fast.ai

solar_plant_detection -> boundary extraction of Photovoltaic (PV) plants using Mask RCNN and Amir dataset

SolarDetection -> unet on satellite image from the USA and France

adopptrs -> Automatic Detection Of Photovoltaic Panels Through Remote Sensing using unet & pytorch

solar-panel-locator -> the number of solar panel pixels was only ~0.2% of the total pixels in the dataset, so solar panel data was upsampled to account for the class imbalance

projects-solar-panel-detection -> List of project to detect solar panels from aerial/satellite images

Satellite_ComputerVision -> UNET to detect solar arrays from Sentinel-2 data, using Google Earth Engine and Tensorflow. Also covers parking lot detection

photovoltaic-detection -> Detecting available rooftop area from satellite images to install photovoltaic panels

Solar_UNet -> U-Net models delineating solar arrays in Sentinel-2 imagery

SolarDetection-solafune -> Solar Panel Detection Using Sentinel-2 for the Solafune Competition

A Comparative Evaluation of Deep Learning Techniques for Photovoltaic Panel Detection from Aerial Images

Segmentation - Ships & vessels

Universal-segmentation-baseline-Kaggle-Airbus-Ship-Detection -> Kaggle Airbus Ship Detection Challenge - bronze medal solution

Airbus-Ship-Segmentation -> unet

contrastive_SSL_ship_detection -> Contrastive self supervised learning for ship detection in Sentinel 2 images

airbus-ship-detection -> using DeepLabV3+

Unet with web-application applied to Airbus ships

Segmentation - Other manmade

Aarsh2001/ML_Challenge_NRSC -> Electrical Substation detection

electrical_substation_detection

MCAN-OilSpillDetection -> Oil Spill Detection with A Multiscale Conditional Adversarial Network under Small Data Training

mining-detector -> detection of artisanal gold mines in Sentinel-2 satellite imagery for Amazon Mining Watch . Also covers clandestine airstrips

EG-UNet Deep Feature Enhancement Method for Land Cover With Irregular and Sparse Spatial Distribution Features: A Case Study on Open-Pit Mining

plastics -> Detecting and Monitoring Plastic Waste Aggregations in Sentinel-2 Imagery

MADOS -> Detecting Marine Pollutants and Sea Surface Features with Deep Learning in Sentinel-2 Imagery on the MADOS dataset

SADMA -> Residual Attention UNet on MARIDA: Marine Debris Archive is a marine debris-oriented dataset on Sentinel-2 satellite images

MAP-Mapper -> Marine Plastic Mapper is a tool for assessing marine macro-plastic density to identify plastic hotspots, underpinned by the MARIDA dataset.

Panoptic segmentation

Things and stuff or how remote sensing could benefit from panoptic segmentation

utae-paps -> PyTorch implementation of U-TAE and PaPs for satellite image time series panoptic segmentation

pastis-benchmark

Panoptic-Generator -> This module converts GIS data into panoptic segmentation tiles

BSB-Aerial-Dataset -> an example on how to use Detectron2's Panoptic-FPN in the BSB Aerial Dataset

Segmentation - Miscellaneous

seg-eval -> SegEval is a Python library that provides tools for evaluating semantic segmentation models. Generate evaluation regions and to analyze segmentation results within them.

awesome-satellite-images-segmentation

Satellite Image Segmentation: a Workflow with U-Net is a decent intro article

mmsegmentation -> Semantic Segmentation Toolbox with support for many remote sensing datasets including LoveDA, Potsdam, Vaihingen & iSAID

segmentation_gym -> A neural gym for training deep learning models to carry out geoscientific image segmentation

How to create a DataBlock for Multispectral Satellite Image Semantic Segmentation using Fastai

Using a U-Net for image segmentation, blending predicted patches smoothly is a must to please the human eye -> python code to blend predicted patches smoothly. See Satellite-Image-Segmentation-with-Smooth-Blending

DCA -> Deep Covariance Alignment for Domain Adaptive Remote Sensing Image Segmentation

SCAttNet -> Semantic Segmentation Network with Spatial and Channel Attention Mechanism

unetseg -> A set of classes and CLI tools for training a semantic segmentation model based on the U-Net architecture, using Tensorflow and Keras. This implementation is tuned specifically for satellite imagery and other geospatial raster data

Semantic Segmentation of Satellite Imagery using U-Net & fast.ai -> with repo

clusternet_segmentation -> Unsupervised Segmentation by applying K-Means clustering to the features generated by Neural Network

Efficient-Transformer -> Efficient Transformer for Remote Sensing Image Segmentation

weakly_supervised -> Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery

HRCNet-High-Resolution-Context-Extraction-Network -> High-Resolution Context Extraction Network for Semantic Segmentation of Remote Sensing Images

Semantic segmentation of SAR images using a self supervised technique

satellite-segmentation-pytorch -> explores a wide variety of image augmentations to increase training dataset size

Spectralformer -> Rethinking hyperspectral image classification with transformers

Unsupervised Segmentation of Hyperspectral Remote Sensing Images with Superpixels

Semantic-Segmentation-with-Sparse-Labels

SNDF -> Superpixel-enhanced deep neural forest for remote sensing image semantic segmentation

Satellite-Image-Classification -> using random forest or support vector machines (SVM) and sklearn

dynamic-rs-segmentation -> Dynamic Multi-Context Segmentation of Remote Sensing Images based on Convolutional Networks

2023GRIC -> Combining UPerNet and ConvNeXt for Contrails Identification to reduce Global Warming

segmentation_models.pytorch -> Segmentation models with pretrained backbones, has been used in multiple winning solutions to remote sensing competitions

SSRN -> Spectral-Spatial Residual Network for Hyperspectral Image Classification: A 3-D Deep Learning Framework

SO-DNN -> Simplified object-based deep neural network for very high resolution remote sensing image classification

SANet -> Scale-Aware Network for Semantic Segmentation of High-Resolution Aerial Images

aerial-segmentation -> Learning Aerial Image Segmentation from Online Maps

IterativeSegmentation -> Recurrent Neural Networks to Correct Satellite Image Classification Maps

Detectron2 FPN + PointRend Model for amazing Satellite Image Segmentation -> 15% increase in accuracy when compared to the U-Net model

HybridSN -> Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification

TNNLS_2022_X-GPN -> Semisupervised Cross-scale Graph Prototypical Network for Hyperspectral Image Classification

singleSceneSemSegTgrs2022 -> Unsupervised Single-Scene Semantic Segmentation for Earth Observation

A-Fast-and-Compact-3-D-CNN-for-HSIC -> A Fast and Compact 3-D CNN for Hyperspectral Image Classification

HSNRS -> Hourglass-ShapeNetwork Based Semantic Segmentation for High Resolution Aerial Imagery

GiGCN -> Graph-in-Graph Convolutional Network for Hyperspectral Image Classification

SSAN -> Spectral-Spatial Attention Networks for Hyperspectral Image Classification

drone-images-semantic-segmentation -> Multiclass Semantic Segmentation of Aerial Drone Images Using Deep Learning

Satellite-Image-Segmentation-with-Smooth-Blending -> uses Smoothly-Blend-Image-Patches

BayesianUNet -> Pytorch Bayesian UNet model for segmentation and uncertainty prediction, applied to the Potsdam Dataset

RAANet -> A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images

wheelRuts_semanticSegmentation -> Mapping wheel-ruts from timber harvesting operations using deep learning techniques in drone imagery

LWN-for-UAVRSI -> Light-Weight Semantic Segmentation Network for UAV Remote Sensing Images, applied to Vaihingen, UAVid and UDD6 datasets

hypernet -> library which implements hyperspectral image (HSI) segmentation

ST-UNet -> Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation

EDFT -> Efficient Depth Fusion Transformer for Aerial Image Semantic Segmentation

WiCoNet -> Looking Outside the Window: Wide-Context Transformer for the Semantic Segmentation of High-Resolution Remote Sensing Images

CRGNet -> Consistency-Regularized Region-Growing Network for Semantic Segmentation of Urban Scenes with Point-Level Annotations

SA-UNet -> Improved U-Net Remote Sensing Classification Algorithm Fusing Attention and Multiscale Features

MANet -> Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

BANet -> Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images

MACU-Net -> MACU-Net for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

DNAS -> Decoupling Neural Architecture Search for High-Resolution Remote Sensing Image Semantic Segmentation

A2-FPN -> A2-FPN for Semantic Segmentation of Fine-Resolution Remotely Sensed Images

MAResU-Net -> Multi-stage Attention ResU-Net for Semantic Segmentation of Fine-Resolution Remote Sensing Images

ml_segmentation -> semantic segmentation of buildings using Random Forest, Support Vector Machine (SVM) & Gradient Boosting Classifier (GBC)

RSEN -> Robust Self-Ensembling Network for Hyperspectral Image Classification

MSNet -> multispectral semantic segmentation network for remote sensing images

k-textures -> K-textures, a self-supervised hard clustering deep learning algorithm for satellite image segmentation

Swin-Transformer-Semantic-Segmentation -> Satellite Image Semantic Segmentation

UDA_for_RS -> Unsupervised Domain Adaptation for Remote Sensing Semantic Segmentation with Transformer

A-3D-CNN-AM-DSC-model-for-hyperspectral-image-classification -> Attention Mechanism and Depthwise Separable Convolution Aided 3DCNN for Hyperspectral Remote Sensing Image Classification

contrastive-distillation -> A Contrastive Distillation Approach for Incremental Semantic Segmentation in Aerial Images

SegForestNet -> SegForestNet: Spatial-Partitioning-Based Aerial Image Segmentation

MFVNet -> MFVNet: Deep Adaptive Fusion Network with Multiple Field-of-Views for Remote Sensing Image Semantic Segmentation

Wildebeest-UNet -> detecting wildebeest and zebras in Serengeti-Mara ecosystem from very-high-resolution satellite imagery

segment-anything-eo -> Earth observation tools for Meta AI Segment Anything (SAM - Segment Anything Model)

HR-Image-classification_SDF2N -> A Shallow-to-Deep Feature Fusion Network for VHR Remote Sensing Image Classification

sink-seg -> Automatic Segmentation of Sinkholes Using a Convolutional Neural Network

Tiling and Stitching Segmentation Output for Remote Sensing: Basic Challenges and Recommendations

EMRT -> Enhancing Multiscale Representations With Transformer for Remote Sensing Image Semantic Segmentation

CMTFNet -> CMTFNet: CNN and Multiscale Transformer Fusion Network for Remote Sensing Image Semantic Segmentation

CM-UNet -> Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

Using Stable Diffusion to Improve Image Segmentation Models -> Augmenting Data with Stable Diffusion

SSRS -> Semantic Segmentation for Remote Sensing, multiple networks implemented

Instance segmentation

In instance segmentation, each individual 'instance' of a segmented area is given a unique lable. For detection of very small objects this may a good approach, but it can struggle seperating individual objects that are closely spaced.

Mask_RCNN generates bounding boxes and segmentation masks for each instance of an object in the image. It is very commonly used for instance segmentation & object detection

Instance segmentation of center pivot irrigation system in Brazil using free Landsat images, mask R-CNN & Keras

Building-Detection-MaskRCNN -> Building detection from the SpaceNet dataset by using Mask RCNN

Oil tank instance segmentation with Mask R-CNN with accompanying article using Keras & Airbus Oil Storage Detection Dataset on Kaggle

Mask_RCNN-for-Caravans -> detect caravan footprints from OS imagery

parking_bays_detectron2 -> Detecting parking bays with satellite imagery. Used Detectron2 and synthetic data with Unreal, superior performance to using Mask RCNN

Locate buildings with a dark roof that feed heat island phenomenon using Mask RCNN -> with repo , used INRIA dataset & labelme for annotation

Circle_Finder -> Circular Shapes Detection in Satellite Imagery, 2nd place solution to the Circle Finder Challenge

Lawn_maskRCNN -> Detecting lawns from satellite images of properties in the Cedar Rapids area using Mask-R-CNN

CropMask_RCNN -> Segmenting center pivot agriculture to monitor crop water use in drylands with Mask R-CNN and Landsat satellite imagery

CATNet -> Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing Images

Object-Detection-on-Satellite-Images-using-Mask-R-CNN -> detect ships

FactSeg -> Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS), also see FarSeg and FreeNet , implementations of research paper

aqua_python -> detecting aquaculture farms using Mask R-CNN

RSPrompter -> Learning to Prompt for Remote Sensing Instance Segmentation based on Visual Foundation Model

satellite image processing thesis titles

Object detection in remote sensing involves locating and surrounding objects of interest with bounding boxes. Due to the large size of remote sensing images and the fact that objects may only comprise a few pixels, object detection can be challenging in this context. The imbalance between the area of the objects to be detected and the background, combined with the potential for objects to be easily confused with random features in the background, further complicates the task. Object detection generally performs better on larger objects, but becomes increasingly difficult as the objects become smaller and more densely packed. The accuracy of object detection models can also degrade rapidly as image resolution decreases, which is why it is common to use high resolution imagery, such as 30cm RGB, for object detection in remote sensing. A unique characteristic of aerial images is that objects can be oriented in any direction. To effectively extract measurements of the length and width of an object, it can be crucial to use rotated bounding boxes that align with the orientation of the object. This approach enables more accurate and meaningful analysis of the objects within the image. Image source

Object tracking in videos

TCTrack -> Temporal Contexts for Aerial Tracking

CFME -> Object Tracking in Satellite Videos by Improved Correlation Filters With Motion Estimations

TGraM -> Multi-Object Tracking in Satellite Videos with Graph-Based Multi-Task Modeling

satellite_video_mod_groundtruth -> groundtruth on satellite video for evaluating moving object detection algorithm

Moving-object-detection-DSFNet -> DSFNet: Dynamic and Static Fusion Network for Moving Object Detection in Satellite Videos

HiFT -> Hierarchical Feature Transformer for Aerial Tracking

Object detection with rotated bounding boxes

Orinted bounding boxes (OBB) are polygons representing rotated rectangles. For datasets checkout DOTA & HRSC2016. Start with Yolov8

mmrotate -> Rotated Object Detection Benchmark, with pretrained models and function for inferencing on very large images

OBBDetection -> an oriented object detection library, which is based on MMdetection

rotate-yolov3 -> Rotation object detection implemented with yolov3. Also see yolov3-polygon

DRBox -> for detection tasks where the objects are orientated arbitrarily, e.g. vehicles, ships and airplanes

s2anet -> Align Deep Features for Oriented Object Detection

CFC-Net -> A Critical Feature Capturing Network for Arbitrary-Oriented Object Detection in Remote Sensing Images

ReDet -> A Rotation-equivariant Detector for Aerial Object Detection

BBAVectors-Oriented-Object-Detection -> Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors

CSL_RetinaNet_Tensorflow -> Arbitrary-Oriented Object Detection with Circular Smooth Label

r3det-on-mmdetection -> R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object

R-DFPN_FPN_Tensorflow -> Rotation Dense Feature Pyramid Networks (Tensorflow)

R2CNN_Faster-RCNN_Tensorflow -> Rotational region detection based on Faster-RCNN

Rotated-RetinaNet -> implemented in pytorch, it supports the following datasets: DOTA, HRSC2016, ICDAR2013, ICDAR2015, UCAS-AOD, NWPU VHR-10, VOC2007

OBBDet_Swin -> The sixth place winning solution in 2021 Gaofen Challenge

CG-Net -> Learning Calibrated-Guidance for Object Detection in Aerial Images

OrientedRepPoints_DOTA -> Oriented RepPoints + Swin Transformer/ReResNet

yolov5_obb -> yolov5 + Oriented Object Detection

How to Train YOLOv5 OBB -> YOLOv5 OBB tutorial and YOLOv5 OBB noteboook

OHDet_Tensorflow -> can be applied to rotation detection and object heading detection

Seodore -> framework maintaining recent updates of mmdetection

Rotation-RetinaNet-PyTorch -> oriented detector Rotation-RetinaNet implementation on Optical and SAR ship dataset

AIDet -> an open source object detection in aerial image toolbox based on MMDetection

rotation-yolov5 -> rotation detection based on yolov5

ShipDetection -> Ship Detection in HR Optical Remote Sensing Images via Rotated Bounding Box, based on Faster R-CNN and ORN, uses caffe

SLRDet -> project based on mmdetection to reimplement RRPN and use the model Faster R-CNN OBB

AxisLearning -> Axis Learning for Orientated Objects Detection in Aerial Images

Detection_and_Recognition_in_Remote_Sensing_Image -> This work uses PaNet to realize Detection and Recognition in Remote Sensing Image by MXNet

DrBox-v2-tensorflow -> tensorflow implementation of DrBox-v2 which is an improved detector with rotatable boxes for target detection in remote sensing images

Rotation-EfficientDet-D0 -> A PyTorch Implementation Rotation Detector based EfficientDet Detector, applied to custom rotation vehicle datasets

DODet -> Dual alignment for oriented object detection, uses DOTA dataset

GF-CSL -> Gaussian Focal Loss: Learning Distribution Polarized Angle Prediction for Rotated Object Detection in Aerial Images

simplified_rbox_cnn -> RBox-CNN: rotated bounding box based CNN for ship detection in remote sensing image. Uses Tensorflow object detection API

Polar-Encodings -> Learning Polar Encodings for Arbitrary-Oriented Ship Detection in SAR Images

R-CenterNet -> detector for rotated-object based on CenterNet

piou -> Orientated Object Detection; IoU Loss, applied to DOTA dataset

DAFNe -> A One-Stage Anchor-Free Approach for Oriented Object Detection

AProNet -> Detecting objects with precise orientation from aerial images. Applied to datasets DOTA and HRSC2016

UCAS-AOD-benchmark -> A benchmark of UCAS-AOD dataset

RotateObjectDetection -> based on Ultralytics/yolov5, with adjustments to enable rotate prediction boxes. Also see PolygonObjectDetection

AD-Toolbox -> Aerial Detection Toolbox based on MMDetection and MMRotate, with support for more datasets

GGHL -> A General Gaussian Heatmap Label Assignment for Arbitrary-Oriented Object Detection

NPMMR-Det -> A Novel Nonlocal-Aware Pyramid and Multiscale Multitask Refinement Detector for Object Detection in Remote Sensing Images

AOPG -> Anchor-Free Oriented Proposal Generator for Object Detection

SE2-Det -> Semantic-Edge-Supervised Single-Stage Detector for Oriented Object Detection in Remote Sensing Imagery

OrientedRepPoints -> Oriented RepPoints for Aerial Object Detection

TS-Conv -> Task-wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial Images

FCOSR -> A Simple Anchor-free Rotated Detector for Aerial Object Detection. This implement is modified from mmdetection. See also TensorRT_Inference

OBB_Detection -> Finalist's solution in the track of Oriented Object Detection in Remote Sensing Images, 2022 Guangdong-Hong Kong-Macao Greater Bay Area International Algorithm Competition

sam-mmrotate -> SAM (Segment Anything Model) for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2

mmrotate-dcfl -> Dynamic Coarse-to-Fine Learning for Oriented Tiny Object Detection

h2rbox-mmrotate -> Horizontal Box Annotation is All You Need for Oriented Object Detection

Spatial-Transform-Decoupling -> Spatial Transform Decoupling for Oriented Object Detection

ARS-DETR -> Aspect Ratio Sensitive Oriented Object Detection with Transformer

CFINet -> Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning. Introduces SODA-A dataset

Object detection enhanced by super resolution

Super-Resolution and Object Detection -> Super-resolution is a relatively inexpensive enhancement that can improve object detection performance

EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network

Mid-Low Resolution Remote Sensing Ship Detection Using Super-Resolved Feature Representation

EESRGAN -> Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network. Applied to COWC & OGST datasets

FBNet -> Feature Balance for Fine-Grained Object Classification in Aerial Images

SuperYOLO -> SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

Salient object detection

Detecting the most noticeable or important object in a scene

ACCoNet -> Adjacent Context Coordination Network for Salient Object Detection in Optical Remote Sensing Images

MCCNet -> Multi-Content Complementation Network for Salient Object Detection in Optical Remote Sensing Images

CorrNet -> Lightweight Salient Object Detection in Optical Remote Sensing Images via Feature Correlation

Reading list for deep learning based Salient Object Detection in Optical Remote Sensing Images

ORSSD-dataset -> salient object detection dataset

EORSSD-dataset -> Extended Optical Remote Sensing Saliency Detection (EORSSD) Dataset

DAFNet_TIP20 -> Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images

EMFINet -> Edge-Aware Multiscale Feature Integration Network for Salient Object Detection in Optical Remote Sensing Images

ERPNet -> Edge-guided Recurrent Positioning Network for Salient Object Detection in Optical Remote Sensing Images

FSMINet -> Fully Squeezed Multi-Scale Inference Network for Fast and Accurate Saliency Detection in Optical Remote Sensing Images

AGNet -> AGNet: Attention Guided Network for Salient Object Detection in Optical Remote Sensing Images

MSCNet -> A lightweight multi-scale context network for salient object detection in optical remote sensing images

GPnet -> Global Perception Network for Salient Object Detection in Remote Sensing Images

SeaNet -> Lightweight Salient Object Detection in Optical Remote Sensing Images via Semantic Matching and Edge Alignment

GeleNet -> Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

Object detection - Buildings, rooftops & solar panels

satellite_image_tinhouse_detector -> Detection of tin houses from satellite/aerial images using the Tensorflow Object Detection API

Machine Learning For Rooftop Detection and Solar Panel Installment discusses tiling large images and generating annotations from OSM data. Features of the roofs were calculated using a combination of contour detection and classification. Follow up article using semantic segmentation

Building Extraction with YOLT2 and SpaceNet Data

XBD-hurricanes -> Models for building (and building damage) detection in high-resolution (<1m) satellite and aerial imagery using a modified RetinaNet model

Detecting solar panels from satellite imagery using segmentation

ssd-spacenet -> Detect buildings in the Spacenet dataset using Single Shot MultiBox Detector (SSD)

3DBuildingInfoMap -> simultaneous extraction of building height and footprint from Sentinel imagery using ResNet

DeepSolaris -> a EuroStat project to detect solar panels in aerial images, further material here

ML_ObjectDetection_CAFO -> Detect Concentrated Animal Feeding Operations (CAFO) in Satellite Imagery

Multi-level-Building-Detection-Framework -> Multilevel Building Detection Framework in Remote Sensing Images Based on Convolutional Neural Networks

Automatic Damage Annotation on Post-Hurricane Satellite Imagery -> detect damaged buildings using tensorflow object detection API. With repos here and here

mappingchallenge -> YOLOv5 applied to the AICrowd Mapping Challenge dataset

Object detection - Ships, boats, vessels & wake

Airbus Ship Detection Challenge -> using oriented bounding boxes. Read Detecting ships in satellite imagery: five years later…

kaggle-ships-in-Google-Earth-yolov8 -> Applying YOLOv8 to Kaggle Ships in Google Earth dataset

How hard is it for an AI to detect ships on satellite images?

Object Detection in Satellite Imagery, a Low Overhead Approach

Detecting Ships in Satellite Imagery using the Planet dataset and Keras

Ship detection using k-means clustering & CNN classifier on patches

SARfish -> Ship detection in Sentinel 1 Synthetic Aperture Radar (SAR) imagery

Arbitrary-Oriented Ship Detection through Center-Head Point Extraction

ship_detection -> using an interesting combination of CNN classifier, Class Activation Mapping (CAM) & UNET segmentation

Building a complete Ship detection algorithm using YOLOv3 and Planet satellite images -> covers finding and annotating data (using LabelMe), preprocessing large images into chips, and training Yolov3. Repo

Ship-detection-in-satellite-images -> experiments with UNET, YOLO, Mask R-CNN, SSD, Faster R-CNN, RETINA-NET

Ship-Detection-from-Satellite-Images-using-YOLOV4 -> uses Kaggle Airbus Ship Detection dataset

shipsnet-detector -> Detect container ships in Planet imagery using machine learning

Classifying Ships in Satellite Imagery with Neural Networks -> applied to the Kaggle Ships in Satellite Imagery dataset

Mask R-CNN for Ship Detection & Segmentation blog post with repo

Boat detection with multi-region-growing method in satellite images

small-boat-detector -> Trained yolo v3 model weights and configuration file to detect small boats in satellite imagery

Satellite-Imagery-Datasets-Containing-Ships -> A list of optical and radar satellite datasets for ship detection, classification, semantic segmentation and instance segmentation tasks

vessel-detection-sentinels -> Sentinel-1 and Sentinel-2 Vessel Detection

Ship-Detection -> CNN approach for ship detection in the ocean using a satellite image

vesselTracker -> Project based on reduced model of Yolov5 architecture using Pytorch. Custom dataset based on SAR imagery provided by Sentinel-1 through Earth Engine API

marine-debris-ml-model -> Marine Debris Detection using tensorflow object detection API

SDGH-Net -> Ship Detection in Optical Remote Sensing Images Based on Gaussian Heatmap Regression

LR-TSDet -> LR-TSDet: Towards Tiny Ship Detection in Low-Resolution Remote Sensing Images

FGSCR-42 -> A public Dataset for Fine-Grained Ship Classification in Remote sensing images

WakeNet -> Rethinking Automatic Ship Wake Detection: State-of-the-Art CNN-based Wake Detection via Optical Images

Histogram of Oriented Gradients (HOG) Boat Heading Classification

Object Detection in Satellite Imagery, a Low Overhead Approach -> Medium article which demonstrates how to combine Canny edge detector pre-filters with HOG feature descriptors, random forest classifiers, and sliding windows to perform ship detection

Ship-Detection-based-on-YOLOv3-and-KV260 -> entry project of the Xilinx Adaptive Computing Challenge 2021. It uses YOLOv3 for ship target detection in optical remote sensing images, and deploys DPU on the KV260 platform to achieve hardware acceleration

LEVIR-Ship -> a dataset for tiny ship detection under medium-resolution remote sensing images

Push-and-Pull-Network -> Contrastive Learning for Fine-grained Ship Classification in Remote Sensing Images

DRENet -> A Degraded Reconstruction Enhancement-Based Method for Tiny Ship Detection in Remote Sensing Images With a New Large-Scale Dataset

xView3-The-First-Place-Solution -> A winning solution for xView 3 challenge (Vessel detection, classification and length estimation on Sentinetl-1 images). Contains trained models, inference pipeline and training code & configs to reproduce the results.

vessel-detection-viirs -> Model and service code for streaming vessel detections from VIIRS satellite imagery

wakemodel_llmassist -> wake detection in Sentinel-2, uses an EfficientNet-B0 architecture adapted for keypoint detection

ORFENet -> Tiny Object Detection in Remote Sensing Images Based on Object Reconstruction and Multiple Receptive Field Adaptive Feature Enhancement. Uses LEVIR-Ship & AI-TODv2 datasets

Object detection - Cars, vehicles & trains

Detection of parkinglots and driveways with retinanet

pytorch-vedai -> object detection on the VEDAI dataset: Vehicle Detection in Aerial Imagery

Truck Detection with Sentinel-2 during COVID-19 crisis -> moving objects in Sentinel-2 data causes a specific reflectance relationship in the RGB, which looks like a rainbow, and serves as a marker for trucks. Improve accuracy by only analysing roads. Not using object detection but relevant. Also see S2TD

cowc_car_counting -> car counting on the Cars Overhead With Context (COWC) dataset . Not sctictly object detection but a CNN to predict the car count in a tile

CarCounting -> using Yolov3 & COWC dataset

Traffic density estimation as a regression problem instead of object detection

Rotation-EfficientDet-D0 -> PyTorch implementation of Rotated EfficientDet, applied to a custom rotation vehicle dataset (car counting)

RSVC2021-Dataset -> A dataset for Vehicle Counting in Remote Sensing images, created from the DOTA & ITCVD

Car Localization and Counting with Overhead Imagery, an Interactive Exploration -> Medium article by Adam Van Etten

Vehicle-Counting-in-Very-Low-Resolution-Aerial-Images -> Vehicle Counting in Very Low-Resolution Aerial Images via Cross-Resolution Spatial Consistency and Intraresolution Time Continuity

Vehicle Detection blog post by Grant Pearse: detecting vehicles across New Zealand without collecting local training data

detecting-trucks -> detecting large vehicles in Sentinel-2

Object detection - Planes & aircraft

FlightScope_Bench -> A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery, including Faster RCNN, DETR, SSD, RTMdet, RetinaNet, CenterNet, YOLOv5, and YOLOv8

Faster RCNN to detect airplanes

yoltv4 includes examples on the RarePlanes dataset

aircraft-detection -> experiments to test the performance of a Gaussian process (GP) classifier with various kernels on the UC Merced land use land cover (LULC) dataset

aircraft-detection-from-satellite-images-yolov3 -> trained on kaggle cgi-planes-in-satellite-imagery-w-bboxes dataset

HRPlanesv2-Data-Set -> YOLOv4 and YOLOv5 weights trained on the HRPlanesv2 dataset

Deep-Learning-for-Aircraft-Recognition -> A CNN model trained to classify and identify various military aircraft through satellite imagery

FRCNN-for-Aircraft-Detection

ergo-planes-detector -> An ergo based project that relies on a convolutional neural network to detect airplanes from satellite imagery, uses the PlanesNet dataset

pytorch-remote-sensing -> Aircraft detection using the 'Airbus Aircraft Detection' dataset and Faster-RCNN with ResNet-50 backbone using pytorch

FasterRCNN_ObjectDetection -> faster RCNN model for aircraft detection and localisation in satellite images and creating a webpage with live server for public usage

HRPlanes -> weights of YOLOv4 and Faster R-CNN networks trained with HRPlanes dataset

aerial-detection -> uses Yolov5 & Icevision

How to choose a deep learning architecture to detect aircrafts in satellite imagery?

rareplanes-yolov5 -> using YOLOv5 and the RarePlanes dataset to detect and classify sub-characteristics of aircraft, with article

OnlyPlanes -> Incrementally Tuning Synthetic Training Datasets for Satellite Object Detection

Understanding the RarePlanes Dataset and Building an Aircraft Detection Model -> blog post

Object detection - Infrastructure & utilities

wind-turbine-detector -> Wind Turbine Object Detection from Aerial Imagery Using TensorFlow Object Detection API

Water Tanks and Swimming Pools Detection -> uses Faster R-CNN

PCAN -> Part-Based Context Attention Network for Thermal Power Plant Detection in Remote Sensing Imagery, with dataset

WindTurbineDetection -> Implementation of transfer learning approach using the YOLOv7 framework to detect and rapidly quantify wind turbines in raw LANDSAT and NAIP satellite imagery

Object detection - Oil storage tank detection

Oil is stored in tanks at many points between extraction and sale, and the volume of oil in storage is an important economic indicator.

A Beginner’s Guide To Calculating Oil Storage Tank Occupancy With Help Of Satellite Imagery

Oil Storage Tank’s Volume Occupancy On Satellite Imagery Using YoloV3 with repo

Oil-Tank-Volume-Estimation -> combines object detection and classical computer vision

SubpixelCircleDetection -> CIRCULAR-SHAPED OBJECT DETECTION IN LOW RESOLUTION SATELLITE IMAGES

oil_storage-detector -> using yolov5 and the Airbus Oil Storage Detection dataset

oil_well_detector -> detect oil wells in the Bakken oil field based on satellite imagery

Oil Storage Detection on Airbus Imagery with YOLOX -> uses the Kaggle Airbus Oil Storage Detection dataset

AContrarioTankDetection -> Oil Tank Detection in Satellite Images via a Contrario Clustering

Object detection - Animals

A variety of techniques can be used to count animals, including object detection and instance segmentation. For convenience they are all listed here:

cownter_strike -> counting cows, located with point-annotations, two models: CSRNet (a density-based method) & LCFCN (a detection-based method)

elephant_detection -> Using Keras-Retinanet to detect elephants from aerial images

CNN-Mosquito-Detection -> determining the locations of potentially dangerous breeding grounds, compared YOLOv4, YOLOR & YOLOv5

Borowicz_etal_Spacewhale -> locate whales using ResNet

walrus-detection-and-count -> uses Mask R-CNN instance segmentation

MarineMammalsDetection -> Weakly Supervised Detection of Marine Animals in High Resolution Aerial Images

Audubon_F21 -> Deep object detection for waterbird monitoring using aerial imagery

Object detection - Miscellaneous

Object detection on Satellite Imagery using RetinaNet -> using the Kaggle Swimming Pool and Car Detection dataset

Tackling the Small Object Problem in Object Detection

Object Detection and Image Segmentation with Deep Learning on Earth Observation Data: A Review

awesome-aerial-object-detection bu murari023 , another by visionxiang and awesome-tiny-object-detection list many relevant papers

Object Detection Accuracy as a Function of Image Resolution -> Medium article using COWC dataset, performance rapidly degrades below 30cm imagery

Satellite Imagery Multiscale Rapid Detection with Windowed Networks (SIMRDWN) -> combines some of the leading object detection algorithms into a unified framework designed to detect objects both large and small in overhead imagery. Train models and test on arbitrary image sizes with YOLO (versions 2 and 3), Faster R-CNN, SSD, or R-FCN.

YOLTv4 -> YOLTv4 is designed to detect objects in aerial or satellite imagery in arbitrarily large images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Read Announcing YOLTv4: Improved Satellite Imagery Object Detection

Tensorflow Benchmarks for Object Detection in Aerial Images

Pytorch Benchmarks for Object Detection in Aerial Images

ASPDNet -> Counting dense objects in remote sensing images

xview-yolov3 -> xView 2018 Object Detection Challenge: YOLOv3 Training and Inference

Faster RCNN for xView satellite data challenge

How to detect small objects in (very) large images -> A practical guide to using Slicing-Aided Hyper Inference (SAHI) for performing inference on the DOTAv1.0 object detection dataset using the mmdetection framework

Object Detection Satellite Imagery Multi-vehicles Dataset (SIMD) -> RetinaNet,Yolov3 and Faster RCNN for multi object detection on satellite images dataset

SNIPER/AutoFocus -> an efficient multi-scale object detection training/inference algorithm

marine_debris_ML -> Marine debris detection, uses 3-meter imagery product called Planetscope with bands in the red, green, blue, and near-infrared. Uses Tensorflow Object Detection API with pre-trained resnet 101

pool-detection-from-aerial-imagery -> Use Icevision and Detectron2 to detect swimming pools from aerial imagery

Electric-Pylon-Detection-in-RSI -> a dataset which contains 1500 remote sensing images of electric pylons used to train ten deep learning models

IS-Count -> IS-Count is a sampling-based and learnable method for estimating the total object count in a region

Object Detection On Aerial Imagery Using RetinaNet

Clustered-Object-Detection-in-Aerial-Image

yolov5s_for_satellite_imagery -> yolov5s applied to the DOTA dataset

RetinaNet-PyTorch -> RetinaNet implementation on remote sensing ship dataset (SSDD)

Detecting-Cyclone-Centers-Custom-YOLOv3 -> tropical cyclones (TCs) are intense warm-corded cyclonic vortices, developed from low-pressure systems over the tropical oceans and driven by complex air-sea interaction

Object-Detection-YoloV3-RetinaNet-FasterRCNN -> trained on a private datset

Google-earth-Object-Recognition -> Code for training and evaluating on Dior Dataset (Google Earth Images) using RetinaNet and YOLOV5

HIECTOR: Hierarchical object detector at scale -> HIECTOR facilitates multiple satellite data collections of increasingly detailed spatial resolution for a cost-efficient and accurate object detection over large areas. Code

Detection of Multiclass Objects in Optical Remote Sensing Images -> Detection of Multiclass Objects in Optical Remote Sensing Images

SB-MSN -> Improving Training Instance Quality in Aerial Image Object Detection With a Sampling-Balance-Based Multistage Network

yoltv5 -> detects objects in arbitrarily large aerial or satellite images that far exceed the ~600×600 pixel size typically ingested by deep learning object detection frameworks. Uses YOLOv5 & pytorch

AIR -> A deep learning object detector framework written in Python for supporting Land Search and Rescue Missions

dior_detect -> benchmarks for object detection on DIOR dataset

Panchromatic to Multispectral: Object Detection Performance as a Function of Imaging Bands -> Medium article, concludes that more bands are not always beneficial, but likely varies by use case

OPLD-Pytorch -> Learning Point-Guided Localization for Detection in Remote Sensing Images

F3Net -> Feature Fusion and Filtration Network for Object Detection in Optical Remote Sensing Images

GLNet -> Global to Local: Clip-LSTM-Based Object Detection From Remote Sensing Images

SRAF-Net -> A Scene-Relevant Anchor-Free Object Detection Network in Remote Sensing Images

object_detection_in_remote_sensing_images -> using CNN and attention mechanism

SHAPObjectDetection -> SHAP-Based Interpretable Object Detection Method for Satellite Imagery

NWD -> A Normalized Gaussian Wasserstein Distance for Tiny Object Detection. Uses AI-TOD dataset

MSFC-Net -> Multiscale Semantic Fusion-Guided Fractal Convolutional Object Detection Network for Optical Remote Sensing Imagery

LO-Det -> LO-Det: Lightweight Oriented Object Detection in Remote Sensing Images

R2IPoints -> Pursuing Rotation-Insensitive Point Representation for Aerial Object Detection

Object-Detection -> Multi-Scale Object Detection with the Pixel Attention Mechanism in a Complex Background

mmdet-rfla -> RFLA: Gaussian Receptive based Label Assignment for Tiny Object Detection

Interactive-Multi-Class-Tiny-Object-Detection -> Interactive Multi-Class Tiny-Object Detection

small-object-detection-benchmark -> Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection (SAHI)

OD-Satellite-iSAID -> Object Detection in Aerial Images: A Case Study on Performance Improvement using iSAID

Large-Selective-Kernel-Network -> Large Selective Kernel Network for Remote Sensing Object Detection

Satellite_Imagery_Detection_YOLOV7 -> YOLOV7 applied to xView1 Dataset

FSANet -> FSANet: Feature-and-Spatial-Aligned Network for Tiny Object Detection in Remote Sensing Images

OAN Fewer is More: Efficient Object Detection in Large Aerial Images, based on MMdetection

DOTA-C -> evaluating the robustness of object detection models to 19 types of image quality degradation

Satellite-Remote-Sensing-Image-Object-Detection -> using RefineDet & DOTA dataset

Object counting

When the object count, but not its shape is required, U-net can be used to treat this as an image-to-image translation problem.

centroid-unet -> Centroid-UNet is deep neural network model to detect centroids from satellite images

DO-U-Net -> an effective approach for when the size of an object needs to be known, as well as the number of objects in the image, initially created to segment and count Internally Displaced People (IDP) camps in Afghanistan

Cassava Crop Counting

Counting from Sky -> A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method

PSGCNet -> PSGCNet: A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote Sensing Images

psgcnet -> A Pyramidal Scale and Global Context Guided Network for Dense Object Counting in Remote-Sensing Images

satellite image processing thesis titles

Regression in remote sensing involves predicting continuous variables such as wind speed, tree height, or soil moisture from an image. Both classical machine learning and deep learning approaches can be used to accomplish this task. Classical machine learning utilizes feature engineering to extract numerical values from the input data, which are then used as input for a regression algorithm like linear regression. On the other hand, deep learning typically employs a convolutional neural network (CNN) to process the image data, followed by a fully connected neural network (FCNN) for regression. The FCNN is trained to map the input image to the desired output, providing predictions for the continuous variables of interest. Image source

python-windspeed -> Predicting windspeed of hurricanes from satellite images, uses CNN regression in keras

hurricane-wind-speed-cnn -> Predicting windspeed of hurricanes from satellite images, uses CNN regression in keras

GEDI-BDL -> Global canopy height regression and uncertainty estimation from GEDI LIDAR waveforms with deep ensembles

Global-Canopy-Height-Map -> Estimating Canopy Height at Scale (ICML2024)

HighResCanopyHeight -> code for Meta paper: Very high resolution canopy height maps from RGB imagery using self-supervised vision transformer and convolutional decoder trained on Aerial Lidar

Traffic density estimation as a regression problem instead of object detection -> inspired by paper: Traffic density estimation method from small satellite imagery: Towards frequent remote sensing of car traffic

OpticalWaveGauging_DNN -> Optical wave gauging using deep neural networks

satellite-pose-estimation -> adapts a ResNet50 model architecture to perform pose estimation on several series of satellite images (both real and synthetic)

Tropical Cyclone Wind Estimation Competition -> on RadiantEarth MLHub

DengueNet -> DengueNet: Dengue Prediction using Spatiotemporal Satellite Imagery for Resource-Limited Countries

tropical_cyclone_uq -> Uncertainty Aware Tropical Cyclone Wind Speed Estimation from Satellite Data

satellite image processing thesis titles

Clouds are a major issue in remote sensing images as they can obscure the underlying ground features. This hinders the accuracy and effectiveness of remote sensing analysis, as the obscured regions cannot be properly interpreted. In order to address this challenge, various techniques have been developed to detect clouds in remote sensing images. Both classical algorithms and deep learning approaches can be employed for cloud detection. Classical algorithms typically use threshold-based techniques and hand-crafted features to identify cloud pixels. However, these techniques can be limited in their accuracy and are sensitive to changes in image appearance and cloud structure. On the other hand, deep learning approaches leverage the power of convolutional neural networks (CNNs) to accurately detect clouds in remote sensing images. These models are trained on large datasets of remote sensing images, allowing them to learn and generalize the unique features and patterns of clouds. The generated cloud mask can be used to identify the cloud pixels and eliminate them from further analysis or, alternatively, cloud inpainting techniques can be used to fill in the gaps left by the clouds. This approach helps to improve the accuracy of remote sensing analysis and provides a clearer view of the ground, even in the presence of clouds. Image adapted from the paper 'Refined UNet Lite: End-to-End Lightweight Network for Edge-precise Cloud Detection'

CloudSEN12 -> Sentinel 2 cloud dataset with a varierty of models here

From this article on sentinelhub there are three popular classical algorithms that detects thresholds in multiple bands in order to identify clouds. In the same article they propose using semantic segmentation combined with a CNN for a cloud classifier (excellent review paper here ), but state that this requires too much compute resources.

This article compares a number of ML algorithms, random forests, stochastic gradient descent, support vector machines, Bayesian method.

Segmentation of Clouds in Satellite Images Using Deep Learning -> semantic segmentation using a Unet on the Kaggle 38-Cloud dataset

Cloud Detection in Satellite Imagery compares FPN+ResNet18 and CheapLab architectures on Sentinel-2 L1C and L2A imagery

Benchmarking Deep Learning models for Cloud Detection in Landsat-8 and Sentinel-2 images

Landsat-8 to Proba-V Transfer Learning and Domain Adaptation for Cloud detection

Multitemporal Cloud Masking in Google Earth Engine

s2cloudmask -> Sentinel-2 Cloud and Shadow Detection using Machine Learning

sentinel2-cloud-detector -> Sentinel Hub Cloud Detector for Sentinel-2 images in Python

dsen2-cr -> cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion, contains the model code, written in Python/Keras, as well as links to pre-trained checkpoints and the SEN12MS-CR dataset

pyatsa -> Python package implementing the Automated Time-Series Analysis method for masking clouds in satellite imagery developed by Zhu and Helmer 2018

decloud -> Decloud enables the training of various deep nets to remove clouds in optical image, using e.g. Sentinel 1 & 2

cloudless -> Deep learning pipeline for orbital satellite data for detecting clouds

Deep-Gapfill -> Official implementation of Optical image gap filling using deep convolutional autoencoder from optical and radar images

satellite-cloud-removal-dip -> Satellite cloud removal with Deep Image Prior, with paper

cloudFCN -> Python 3 package for Fully Convolutional Network development, specifically for cloud masking

Fmask -> Fmask (Function of mask) is used for automated clouds, cloud shadows, snow, and water masking for Landsats 4-9 and Sentinel 2 images, in Matlab. Also see PyFmask

HOW TO USE DEEP LEARNING, PYTORCH LIGHTNING, AND THE PLANETARY COMPUTER TO PREDICT CLOUD COVER IN SATELLITE IMAGERY

cloud-cover-winners -> winning submissions for the On Cloud N: Cloud Cover Detection Challenge

On-Cloud-N: Cloud Cover Detection Challenge - 19th Place Solution

ukis-csmask -> package to masks clouds in Sentinel-2, Landsat-8, Landsat-7 and Landsat-5 images

OpenSICDR -> long list of satellite image cloud detection resources

RS-Net -> A cloud detection algorithm for satellite imagery based on deep learning

Clouds-Segmentation-Project -> treats as a 3 class problem; Open clouds, Closed clouds and no clouds, uses pytorch on a dataset that consists of IR & Visual Grayscale images

STGAN -> STGAN for Cloud Removal in Satellite Images

mcgan-cvprw2017-pytorch -> Filmy Cloud Removal on Satellite Imagery with Multispectral Conditional Generative Adversarial Nets

Cloud-Net: A semantic segmentation CNN for cloud detection -> an end-to-end cloud detection algorithm for Landsat 8 imagery, trained on 38-Cloud Training Set

fcd -> Fixed-Point GAN for Cloud Detection. A weakly-supervised approach, training with only image-level labels

CloudX-Net -> an efficient and robust architecture used for detection of clouds from satellite images

A simple cloud-detection walk-through using Convolutional Neural Network (CNN and U-Net) and fast.ai library

38Cloud-Medium -> Walk-through using u-net to detect clouds in satellite images with fast.ai

cloud_detection_using_satellite_data -> performed on Sentinel 2 data

Luojia1-Cloud-Detection -> Luojia-1 Satellite Visible Band Nighttime Imagery Cloud Detection

SEN12MS-CR-TS -> A Remote Sensing Data Set for Multi-modal Multi-temporal Cloud Removal

ES-CCGAN -> This is a dehazed method for remote sensing image, which based on CycleGAN

Cloud_Classification_DL -> Classifying cloud organization patterns from satellite images using Deep Learning techniques (Mask R-CNN)

CNN-based-Cloud-Detection-Methods -> Understanding the Role of Receptive Field of Convolutional Neural Network for Cloud Detection in Landsat 8 OLI Imagery

cloud-removal-deploy -> flask app for cloud removal

CloudMattingGAN -> Generative Adversarial Training for Weakly Supervised Cloud Matting

km_predict -> KappaMask, or km-predict, is a cloud detector for Sentinel-2 Level-1C and Level-2A input products applied to S2 full image prediction

CDnet -> CNN-Based Cloud Detection for Remote Sensing Imager

CDnetV2 -> CNN-Based Cloud Detection for Remote Sensing Imagery With Cloud-Snow Coexistence

grouped-features-alignment -> Unsupervised Domain Adaptation for Cloud Detection Based on Grouped Features Alignment and Entropy Minimization

Detecting Cloud Cover Via Sentinel-2 Satellite Data -> blog post on Benjamin Warners Top-10 Percent Solution to DrivenData’s On CloudN Competition using fast.ai & customized version of XResNeXt50. Repo

AISD -> Deeply supervised convolutional neural network for shadow detection based on a novel aerial shadow imagery dataset

CloudGAN -> Detecting and Removing Clouds from RGB-images using Image Inpainting

Using GANs to Augment Data for Cloud Image Segmentation Task

Cloud-Segmentation-from-Satellite-Imagery -> applied to Sentinel-2 dataset

HRC_WHU -> High-Resolution Cloud Detection Dataset comprising 150 RGB images and a resolution varying from 0.5 to 15 m in different global regions

MEcGANs -> Cloud Removal from Satellite Imagery using Multispectral Edge-filtered Conditional Generative Adversarial Networks

CloudXNet -> CloudX-net: A robust encoder-decoder architecture for cloud detection from satellite remote sensing images

cloud-buster -> Sentinel-2 L1C and L2A Imagery with Fewer Clouds

SatelliteCloudGenerator -> A PyTorch-based tool to generate clouds for satellite images

SEnSeI -> A python 3 package for developing sensor independent deep learning models for cloud masking in satellite imagery

cloud-detection-venus -> Using Convolutional Neural Networks for Cloud Detection on VENμS Images over Multiple Land-Cover Types

explaining_cloud_effects -> Explaining the Effects of Clouds on Remote Sensing Scene Classification

Clouds-Images-Segmentation -> Marine Stratocumulus Cloud-Type Classification from SEVIRI Using Convolutional Neural Networks

DeCloud-GAN -> DeCloud GAN: An Advanced Generative Adversarial Network for Removing Cloud Cover in Optical Remote Sensing Imagery

cloud_segmentation_comparative -> BenchCloudVision: A Benchmark Analysis of Deep Learning Approaches for Cloud Detection and Segmentation in Remote Sensing Imagery

PLFM-Clouds-Removal -> Spatio-Temporal SAR-Optical Data Fusion for Cloud Removal via a Deep Hierarchical Model

Cloud-removal-model-collection -> A collection of the existing end-to-end cloud removal models

SEnSeIv2 -> Sensor Independent Cloud and Shadow Masking with Ambiguous Labels and Multimodal Inputs

UnCRtainTS -> Uncertainty Quantification for Cloud Removal in Optical Satellite Time Series

U-TILISE -> A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series

satellite image processing thesis titles

Change detection is a vital component of remote sensing analysis, enabling the monitoring of landscape changes over time. This technique can be applied to identify a wide range of changes, including land use changes, urban development, coastal erosion, and deforestation. Change detection can be performed on a pair of images taken at different times, or by analyzing multiple images collected over a period of time. It is important to note that while change detection is primarily used to detect changes in the landscape, it can also be influenced by the presence of clouds and shadows. These dynamic elements can alter the appearance of the image, leading to false positives in change detection results. Therefore, it is essential to consider the impact of clouds and shadows on change detection analysis, and to employ appropriate methods to mitigate their influence. Image source

awesome-remote-sensing-change-detection lists many datasets and publications

Change-Detection-Review -> A review of change detection methods, including code and open data sets for deep learning

Change Detection using Siamese Networks

STANet ->STANet for remote sensing image change detection

UNet-based-Unsupervised-Change-Detection -> A convolutional neural network (CNN) and semantic segmentation is implemented to detect the changes between the images, as well as classify the changes into the correct semantic class

BIT_CD -> Official Pytorch Implementation of Remote Sensing Image Change Detection with Transformers

Unstructured-change-detection-using-CNN

Siamese neural network to detect changes in aerial images -> uses Keras and VGG16 architecture

Change Detection in 3D: Generating Digital Elevation Models from Dove Imagery

QGIS plugin for applying change detection algorithms on high resolution satellite imagery

LamboiseNet -> Master thesis about change detection in satellite imagery using Deep Learning

Fully Convolutional Siamese Networks for Change Detection

Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks -> used the Onera Satellite Change Detection (OSCD) dataset

IAug_CDNet -> Official Pytorch Implementation of Adversarial Instance Augmentation for Building Change Detection in Remote Sensing Images

dpm-rnn-public -> Code implementing a damage mapping method combining satellite data with deep learning

SenseEarth2020-ChangeDetection -> 1st place solution to the Satellite Image Change Detection Challenge hosted by SenseTime; predictions of five HRNet-based segmentation models are ensembled, serving as pseudo labels of unchanged areas

KPCAMNet -> Python implementation of the paper Unsupervised Change Detection in Multi-temporal VHR Images Based on Deep Kernel PCA Convolutional Mapping Network

CDLab -> benchmarking deep learning-based change detection methods.

Siam-NestedUNet -> SNUNet-CD: A Densely Connected Siamese Network for Change Detection of VHR Images

SUNet-change_detection -> Implementation of paper SUNet: Change Detection for Heterogeneous Remote Sensing Images from Satellite and UAV Using a Dual-Channel Fully Convolution Network

Self-supervised Change Detection in Multi-view Remote Sensing Images

MFPNet -> Remote Sensing Change Detection Based on Multidirectional Adaptive Feature Fusion and Perceptual Similarity

GitHub for the DIUx xView Detection Challenge -> The xView2 Challenge focuses on automating the process of assessing building damage after a natural disaster

DASNet -> Dual attentive fully convolutional siamese networks for change detection of high-resolution satellite images

Self-Attention for Raw Optical Satellite Time Series Classification

planet-movement -> Find and process Planet image pairs to highlight object movement

temporal-cluster-matching -> detecting change in structure footprints from time series of remotely sensed imagery

autoRIFT -> fast and intelligent algorithm for finding the pixel displacement between two images

DSAMNet -> A Deeply Supervised Attention Metric-Based Network and an Open Aerial Image Dataset for Remote Sensing Change Detection

SRCDNet -> Super-resolution-based Change Detection Network with Stacked Attention Module for Images with Different Resolutions. SRCDNet is designed to learn and predict change maps from bi-temporal images with different resolutions

Land-Cover-Analysis -> Land Cover Change Detection using Satellite Image Segmentation

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sening images

Satellite-Image-Alignment-Differencing-and-Segmentation

Change Detection in Multi-temporal Satellite Images -> uses Principal Component Analysis (PCA) and K-means clustering

Unsupervised Change Detection Algorithm using PCA and K-Means Clustering -> in Matlab but has paper

ChangeFormer -> A Transformer-Based Siamese Network for Change Detection. Uses transformer architecture to address the limitations of CNN in handling multi-scale long-range details. Demonstrates that ChangeFormer captures much finer details compared to the other SOTA methods, achieving better performance on benchmark datasets

Heterogeneous_CD -> Heterogeneous Change Detection in Remote Sensing Images

ChangeDetectionProject -> Trying out Active Learning in with deep CNNs for Change detection on remote sensing data

DSFANet -> Unsupervised Deep Slow Feature Analysis for Change Detection in Multi-Temporal Remote Sensing Images

siamese-change-detection -> Targeted synthesis of multi-temporal remote sensing images for change detection using siamese neural networks

Bi-SRNet -> Bi-Temporal Semantic Reasoning for the Semantic Change Detection in HR Remote Sensing Images

SiROC -> Spatial Context Awareness for Unsupervised Change Detection in Optical Satellite Images. Applied to Sentinel-2 and high-resolution Planetscope imagery on four datasets

DSMSCN -> Tensorflow implementation for Change Detection in Multi-temporal VHR Images Based on Deep Siamese Multi-scale Convolutional Neural Networks

RaVAEn -> a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment. It flags changed areas to prioritise for downlink, shortening the response time

SemiCD -> Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. Achieves the performance of supervised CD even with access to as little as 10% of the annotated training data

FCCDN_pytorch -> FCCDN: Feature Constraint Network for VHR Image Change Detection. Uses the LEVIR-CD building change detection dataset

INLPG_Python -> Structure Consistency based Graph for Unsupervised Change Detection with Homogeneous and Heterogeneous Remote Sensing Images

NSPG_Python -> Nonlocal patch similarity based heterogeneous remote sensing change detection

DS_UNet -> Sentinel-1 and Sentinel-2 Data Fusion for Urban Change Detection using a Dual Stream U-Net, uses Onera Satellite Change Detection dataset

SiameseSSL -> Urban change detection with a Dual-Task Siamese network and semi-supervised learning. Uses SpaceNet 7 dataset

CD-SOTA-methods -> Remote sensing change detection: State-of-the-art methods and available datasets

multimodalCD_ISPRS21 -> Fusing Multi-modal Data for Supervised Change Detection

Unsupervised-CD-in-SITS-using-DL-and-Graphs -> Unsupervised Change Detection Analysis in Satellite Image Time Series using Deep Learning Combined with Graph-Based Approaches

LSNet -> Extremely Light-Weight Siamese Network For Change Detection in Remote Sensing Image

Change-Detection-in-Remote-Sensing-Images -> using PCA & K-means

End-to-end-CD-for-VHR-satellite-image -> End-to-End Change Detection for High Resolution Satellite Images Using Improved UNet++

Semantic-Change-Detection -> SCDNET: A novel convolutional network for semantic change detection in high resolution optical remote sensing imagery

ERCNN-DRS_urban_change_monitoring -> Neural Network-Based Urban Change Monitoring with Deep-Temporal Multispectral and SAR Remote Sensing Data

EGRCNN -> Edge-guided Recurrent Convolutional Neural Network for Multi-temporal Remote Sensing Image Building Change Detection

Unsupervised-Remote-Sensing-Change-Detection -> An Unsupervised Remote Sensing Change Detection Method Based on Multiscale Graph Convolutional Network and Metric Learning

CropLand-CD -> A CNN-transformer Network with Multi-scale Context Aggregation for Fine-grained Cropland Change Detection

contrastive-surface-image-pretraining -> Supervising Remote Sensing Change Detection Models with 3D Surface Semantics

dcvaVHROptical -> Unsupervised Deep Change Vector Analysis for Multiple-Change Detection in VHR Images

hyperdimensionalCD -> Change Detection in Hyperdimensional Images Using Untrained Models

FCD-GAN-pytorch -> Fully Convolutional Change Detection Framework with Generative Adversarial Network (FCD-GAN) is a framework for change detection in multi-temporal remote sensing images

DARNet-CD -> A Densely Attentive Refinement Network for Change Detection Based on Very-High-Resolution Bitemporal Remote Sensing Images

xView2_Vulcan -> Damage assessment using pre and post orthoimagery. Modified + productionized model based off the first-place model from the xView2 challenge.

ESCNet -> An End-to-End Superpixel-Enhanced Change Detection Network for Very-High-Resolution Remote Sensing Images

ForestCoverChange -> Detecting and Predicting Forest Cover Change in Pakistani Areas Using Remote Sensing Imagery

deforestation-detection -> DEEP LEARNING FOR HIGH-FREQUENCY CHANGE DETECTION IN UKRAINIAN FOREST ECOSYSTEM WITH SENTINEL-2

SentinelClearcutDetection -> Scripts for deforestation detection on the Sentinel-2 Level-A images

clearcut_detection -> research & web-service for clearcut detection

CDRL -> Unsupervised Change Detection Based on Image Reconstruction Loss

ddpm-cd -> Remote Sensing Change Detection (Segmentation) using Denoising Diffusion Probabilistic Models

Remote-sensing-time-series-change-detection -> Graph-based block-level urban change detection using Sentinel-2 time series

austin-ml-change-detection-demo -> A change detection demo for the Austin area using a pre-trained PyTorch model scaled with Dask on Planet imagery

dfc2021-msd-baseline -> Multitemporal Semantic Change Detection track of the 2021 IEEE GRSS Data Fusion Competition

CorrFusionNet -> Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

ChangeDetectionPCAKmeans -> Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k-Means Clustering.

IRCNN -> IRCNN: An Irregular-Time-Distanced Recurrent Convolutional Neural Network for Change Detection in Satellite Time Series

UTRNet -> An Unsupervised Time-Distance-Guided Convolutional Recurrent Network for Change Detection in Irregularly Collected Images

open-cd -> an open source change detection toolbox based on a series of open source general vision task tools

Tiny_model_4_CD -> TINYCD: A (Not So) Deep Learning Model For Change Detection. Uses LEVIR-CD & WHU-CD datasets

FHD -> Feature Hierarchical Differentiation for Remote Sensing Image Change Detection

Change detection with Raster Vision -> blog post with Colab notebook

building-expansion -> Enhancing Environmental Enforcement with Near Real-Time Monitoring: Likelihood-Based Detection of Structural Expansion of Intensive Livestock Farms

SaDL_CD -> Semantic-aware Dense Representation Learning for Remote Sensing Image Change Detection

EGCTNet_pytorch -> Building Change Detection Based on an Edge-Guided Convolutional Neural Network Combined with a Transformer

S2-cGAN -> S2-cGAN: Self-Supervised Adversarial Representation Learning for Binary Change Detection in Multispectral Images

A-loss-function-for-change-detection -> UAL: Unchanged Area Loss-Function for Change Detection Networks

IEEE_TGRS_SSTFormer -> Spectral–Spatial–Temporal Transformers for Hyperspectral Image Change Detection

DMINet -> Change Detection on Remote Sensing Images Using Dual-Branch Multilevel Intertemporal Network

AFCF3D-Net -> Adjacent-level Feature Cross-Fusion with 3D CNN for Remote Sensing Image Change Detection

DSAHRNet -> A Deeply Attentive High-Resolution Network for Change Detection in Remote Sensing Images

RDPNet -> RDP-Net: Region Detail Preserving Network for Change Detection

BGAAE_CD -> Bipartite Graph Attention Autoencoders for Unsupervised Change Detection Using VHR Remote Sensing Images

Unsupervised-Change-Detection -> Unsupervised Change Detection in Satellite Images Using Principal Component Analysis and k-Means Clustering

Metric-CD -> Deep Metric Learning for Unsupervised Change Detection in Remote Sensing Images

HANet-CD -> HANet: A hierarchical attention network for change detection with bi-temporal very-high-resolution remote sensing images

SRGCAE -> Unsupervised Multimodal Change Detection Based on Structural Relationship Graph Representation Learning

change_detection_onera_baselines -> Siamese version of U-Net baseline model

SiamCRNN -> Change Detection in Multisource VHR Images via Deep Siamese Convolutional Multiple-Layers Recurrent Neural Network

Graph-based methods for change detection in remote sensing images -> Graph Learning Based on Signal Smoothness Representation for Homogeneous and Heterogeneous Change Detection

AR-CDNet -> Towards Accurate and Reliable Change Detection of Remote Sensing Images via Knowledge Review and Online Uncertainty Estimation

CICNet -> Compact Intertemporal Coupling Network for Remote Sensing Change Detection

BGINet -> Remote Sensing Image Change Detection with Graph Interaction

DSNUNet -> DSNUNet: An Improved Forest Change Detection Network by Combining Sentinel-1 and Sentinel-2 Images

Forest-CD -> Forest-CD: Forest Change Detection Network Based on VHR Images

S3Net_CD -> Superpixel-Guided Self-Supervised Learning Network for Change Detection in Multitemporal Image Change Detection

T-UNet -> T-UNet: Triplet UNet for Change Detection in High-Resolution Remote Sensing Images

UCDFormer -> UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation

satellite-change-events -> Change Event Dataset for Discovery from Spatio-temporal Remote Sensing Imagery, uses Sentinel 2 CaiRoad & CalFire datasets

CACo -> Change-Aware Sampling and Contrastive Learning for Satellite Images

LightCDNet -> LightCDNet: Lightweight Change Detection Network Based on VHR Images

OpenMineChangeDetection -> Characterising Open Cast Mining from Satellite Data (Sentinel 2), implements TinyCD, LSNet & DDPM-CD

multi-task-L-UNet -> A Deep Multi-Task Learning Framework Coupling Semantic Segmentation and Fully Convolutional LSTM Networks for Urban Change Detection. Applied to SpaceNet7 dataset

urban_change_detection -> Detecting Urban Changes With Recurrent Neural Networks From Multitemporal Sentinel-2 Data. fabric is another implementation

UNetLSTM -> Detecting Urban Changes With Recurrent Neural Networks From Multitemporal Sentinel-2 Data

SDACD -> An End-to-end Supervised Domain Adaptation Framework for Cross-domain Change Detection

CycleGAN-Based-DA-for-CD -> CycleGAN-based Domain Adaptation for Deforestation Detection

CGNet-CD -> Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

PA-Former -> PA-Former: Learning Prior-Aware Transformer for Remote Sensing Building Change Detection

AERNet -> AERNet: An Attention-Guided Edge Refinement Network and a Dataset for Remote Sensing Building Change Detection (HRCUS-CD)

S1GFlood-Detection -> DAM-Net: Global Flood Detection from SAR Imagery Using Differential Attention Metric-Based Vision Transformers. Includes S1GFloods dataset

Changen -> Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

TTP -> Time Travelling Pixels: Bitemporal Features Integration with Foundation Model for Remote Sensing Image Change Detection

SAM-CD -> Adapting Segment Anything Model for Change Detection in HR Remote Sensing Images

SCanNet -> Joint Spatio-Temporal Modeling for Semantic Change Detection in Remote Sensing Images

ELGC-Net -> Efficient Local-Global Context Aggregation for Remote Sensing Change Detection

Official_Remote_Sensing_Mamba -> RS-Mamba for Large Remote Sensing Image Dense Prediction

ChangeMamba -> Remote Sensing Change Detection with Spatio-Temporal State Space Model

ClearSCD -> Comprehensively leveraging semantics and change relationships for semantic change detection in high spatial resolution remote sensing imagery

RSCaMa -> Remote Sensing Image Change Captioning with State Space Model

ChangeBind -> A Hybrid Change Encoder for Remote Sensing Change Detection

OctaveNet -> An efficient multi-scale pseudo-siamese network for change detection in remote sensing images

MaskCD -> A Remote Sensing Change Detection Network Based on Mask Classification

I3PE -> Exchange means change: an unsupervised single-temporal change detection framework based on intra- and inter-image patch exchange

BDANet -> Multiscale Convolutional Neural Network with Cross-directional Attention for Building Damage Assessment from Satellite Images

BAN -> A New Learning Paradigm for Foundation Model-based Remote Sensing Change Detection

ubdd -> Learning Efficient Unsupervised Satellite Image-based Building Damage Detection, uses xView2

SGSLN -> Exchanging Dual-Encoder–Decoder: A New Strategy for Change Detection With Semantic Guidance and Spatial Localization

ChangeViT -> Unleashing Plain Vision Transformers for Change Detection

pytorch-change-models -> out-of-box contemporary spatiotemporal change model implementations, standard metrics, and datasets

satellite image processing thesis titles

The analysis of time series observations in remote sensing data has numerous applications, including enhancing the accuracy of classification models and forecasting future patterns and events. Image source . Note: since classifying crops and predicting crop yield are such prominent use case for time series data, these tasks have dedicated sections after this one.

LANDSAT Time Series Analysis for Multi-temporal Land Cover Classification using Random Forest

temporalCNN -> Temporal Convolutional Neural Network for the Classification of Satellite Image Time Series

pytorch-psetae -> Satellite Image Time Series Classification with Pixel-Set Encoders and Temporal Self-Attention

satflow -> optical flow models for predicting future satellite images from current and past ones

esa-superresolution-forecasting -> Forecasting air pollution using ESA Sentinel-5p data, and an encoder-decoder convolutional LSTM neural network architecture

lightweight-temporal-attention-pytorch -> Light Temporal Attention Encoder (L-TAE) for satellite image time series

dtwSat -> Time-Weighted Dynamic Time Warping for satellite image time series analysis

MTLCC -> Multitemporal Land Cover Classification Network. A recurrent neural network approach to encode multi-temporal data for land cover classification

PWWB -> Real-Time Spatiotemporal Air Pollution Prediction with Deep Convolutional LSTM through Satellite Image Analysis

spaceweather -> predicting geomagnetic storms from satellite measurements of the solar wind and solar corona, uses LSTMs

Forest_wildfire_spreading_convLSTM -> Modeling of the spreading of forest wildfire using a neural network with ConvLSTM cells. Prediction 3-days forward

ConvTimeLSTM -> Extension of ConvLSTM and Time-LSTM for irregularly spaced images, appropriate for Remote Sensing

dl-time-series -> Deep Learning algorithms applied to characterization of Remote Sensing time-series

tpe -> Generalized Classification of Satellite Image Time Series With Thermal Positional Encoding

wildfire_forecasting -> Deep Learning Methods for Daily Wildfire Danger Forecasting. Uses ConvLSTM

satellite_image_forecasting -> predict future satellite images from past ones using features such as precipitation and elevation maps. Entry for the EarthNet2021 challenge

Deep Learning for Cloud Gap-Filling on Normalized Difference Vegetation Index using Sentinel Time-Series -> A CNN-RNN based model that identifies correlations between optical and SAR data and exports dense Normalized Difference Vegetation Index (NDVI) time-series of a static 6-day time resolution and can be used for Events Detection tasks

DeepSatModels -> ViTs for SITS: Vision Transformers for Satellite Image Time Series

Presto -> Lightweight, Pre-trained Transformers for Remote Sensing Timeseries

LULC mapping using time series data & spectral bands -> uses 1D convolutions that learn from time-series data. Accompanies blog post: Time-Traveling Pixels: A Journey into Land Use Modeling

hurricane-net -> A deep learning framework for forecasting Atlantic hurricane trajectory and intensity.

CAPES -> Construction changes are detected using the U-net model and satellite time series

Exchanger4SITS -> Rethinking the Encoding of Satellite Image Time Series

Rapid Wildfire Hotspot Detection Using Self-Supervised Learning on Temporal Remote Sensing Data

stenn-pytorch -> A Spatio-temporal Encoding Neural Network for Semantic Segmentation of Satellite Image Time Series

satellite image processing thesis titles

Crop classification in remote sensing is the identification and mapping of different crops in images or sequences of images. It aims to provide insight into the distribution and composition of crops in a specific area, with applications that include monitoring crop growth and evaluating crop damage. Both traditional machine learning methods, such as decision trees and support vector machines, and deep learning techniques, such as convolutional neural networks (CNNs), can be used to perform crop classification. The optimal method depends on the size and complexity of the dataset, the desired accuracy, and the available computational resources. However, the success of crop classification relies heavily on the quality and resolution of the input data, as well as the availability of labeled training data. Image source: High resolution satellite imaging sensors for precision agriculture by Chenghai Yang

Classification of Crop Fields through Satellite Image Time Series -> using a pytorch-psetae & Sentinel-2 data

CropDetectionDL -> using GRU-net, First place solution for Crop Detection from Satellite Imagery competition organized by CV4A workshop at ICLR 2020

Radiant-Earth-Spot-the-Crop-Challenge -> The main objective of this challenge was to use time-series of Sentinel-2 multi-spectral data to classify crops in the Western Cape of South Africa. The challenge was to build a machine learning model to predict crop type classes for the test dataset

CropMappingInterpretation -> An interpretation pipeline towards understanding multi-temporal deep learning approaches for crop mapping

timematch -> A method to perform unsupervised cross-region adaptation of crop classifiers trained with satellite image time series. We also introduce an open-access dataset for cross-region adaptation with SITS from four different regions in Europe

elects -> End-to-End Learned Early Classification of Time Series for In-Season Crop Type Mapping

3d-fpn-and-time-domain -> Sentinel 2 Time Series Analysis with 3D Feature Pyramid Network and Time Domain Class Activation Intervals for Crop Mapping

in-season-and-dynamic-crop-mapping -> In-season and dynamic crop mapping using 3D convolution neural networks and sentinel-2 time series, uses the Lombardy crop dataset

MultiviewCropClassification -> A COMPARATIVE ASSESSMENT OF MULTI-VIEW FUSION LEARNING FOR CROP CLASSIFICATION

Detection of manure application on crop fields leveraging satellite data and Machine Learning

StressNet: A spatial-spectral-temporal deformable attention-based framework for water stress classification in maize -> Water Stress Classification on Multispectral data of Maize captured by UAV

XAI4EO -> Towards Explainable AI4EO: an explainable DL approach for crop type mapping using SITS

satellite image processing thesis titles

Crop yield is a crucial metric in agriculture, as it determines the productivity and profitability of a farm. It is defined as the amount of crops produced per unit area of land and is influenced by a range of factors including soil fertility, weather conditions, the type of crop grown, and pest and disease control. By utilizing time series of satellite images, it is possible to perform accurate crop type classification and take advantage of the seasonal variations specific to certain crops. This information can be used to optimize crop management practices and ultimately improve crop yield. However, to achieve accurate results, it is essential to consider the quality and resolution of the input data, as well as the availability of labeled training data. Appropriate pre-processing and feature extraction techniques must also be employed. Image source .

Crop yield Prediction with Deep Learning -> Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data

Deep-Transfer-Learning-Crop-Yield-Prediction

Crop-Yield-Prediction-using-ML -> A simple Web application developed in order to provide the farmers/users an approximation on how much amount of crop yield will be produced depending upon the given input

Building a Crop Yield Prediction App in Senegal Using Satellite Imagery and Jupyter Voila

Crop Yield Prediction Using Deep Neural Networks and LSTM

Deep transfer learning techniques for crop yield prediction, published in COMPASS 2018

Understanding crop yield predictions from CNNs

Advanced Deep Learning Techniques for Predicting Maize Crop Yield using Sentinel-2 Satellite Imagery

pycrop-yield-prediction -> Deep Gaussian Process for Crop Yield Prediction

PredictYield -> using data scraped from Google Earth Engine, this predicts the yield of Corn, Soybean, and Wheat in the USA with Keras

Crop-Yield-Prediction-and-Estimation-using-Time-series-remote-sensing-data

Yield-Prediction-Using-Sentinel-Data

SPACY -> Satellite Prediction of Aggregate Corn Yield

cropyieldArticle -> Scalable Crop Yield Prediction with Sentinel-2 Time Series and Temporal Convolutional Network

CNN-RNN-Yield-Prediction ->A CNN-RNN Framework for Crop Yield Prediction

Yield-Prediction-DNN -> Crop Yield Prediction Using Deep Neural Networks

MMST-ViT -> MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer. This paper utilizes the Tiny CropNet dataset

Greenearthnet -> Multi-modal learning for geospatial vegetation forecasting

crop-forecasting -> Predicting rice field yields

SICKLE -> A Multi-Sensor Satellite Imagery Dataset Annotated with Multiple Key Cropping Parameters. Basline solutions: U-TAE, U-Net3D and ConvLSTM

Wealth and economic activity

satellite image processing thesis titles

The traditional approach of collecting economic data through ground surveys is a time-consuming and resource-intensive process. However, advancements in satellite technology and machine learning offer an alternative solution. By utilizing satellite imagery and applying machine learning algorithms, it is possible to obtain accurate and current information on economic activity with greater efficiency. This shift towards satellite imagery-based forecasting not only provides cost savings but also offers a wider and more comprehensive perspective of economic activity. As a result, it is poised to become a valuable asset for both policymakers and businesses. Image source .

Using publicly available satellite imagery and deep learning to understand economic well-being in Africa, Nature Comms 22 May 2020 -> Used CNN on Ladsat imagery (night & day) to predict asset wealth of African villages

satellite_led_liverpool -> Remote Sensing-Based Measurement of Living Environment Deprivation - Improving Classical Approaches with Machine Learning

Predicting_Energy_Consumption_With_Convolutional_Neural_Networks

SustainBench -> Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning

Measuring the Impacts of Poverty Alleviation Programs with Satellite Imagery and Deep Learning

Building a Spatial Model to Classify Global Urbanity Levels -> estimage global urbanity levels from population data, nightime lights and road networks

deeppop -> Deep Learning Approach for Population Estimation from Satellite Imagery, also on Github

Estimating telecoms demand in areas of poor data availability

satimage -> Code and models for the manuscript "Predicting Poverty and Developmental Statistics from Satellite Images using Multi-task Deep Learning". Predict the main material of a roof, source of lighting and source of drinking water for properties, from satellite imagery

africa_poverty -> Using publicly available satellite imagery and deep learning to understand economic well-being in Africa

Predicting-Poverty -> Combining satellite imagery and machine learning to predict poverty, in PyTorch

income-prediction -> Predicting average yearly income based on satellite imagery using CNNs, uses pytorch

urban_score -> Learning to score economic development from satellite imagery

READ -> Lightweight and robust representation of economic scales from satellite imagery

Slum-classification -> Binary classification on a very high-resolution satellite image in case of mapping informal settlements using unet

Predicting_Poverty -> uses daytime & luminosity of nighttime satellite images

Cancer-Prevalence-Satellite-Images -> Predict Health Outcomes from Features of Satellite Images

Mapping Poverty in Bangladesh with Satellite Images and Deep Learning -> combines health data with OpenStreetMaps Data & night and daytime satellite imagery

Population Estimation from Satellite Imagery

Deep_Learning_Satellite_Imd -> Using Deep Learning on Satellite Imagery to predict population and economic indicators

Disaster response

satellite image processing thesis titles

Remote sensing images are used in disaster response to identify and assess damage to an area. This imagery can be used to detect buildings that are damaged or destroyed, identify roads and road networks that are blocked, determine the size and shape of a disaster area, and identify areas that are at risk of flooding. Remote sensing images can also be used to detect and monitor the spread of forest fires and monitor vegetation health. Also checkout the sections on change detection and water/fire/building segmentation. Image source .

DisaVu -> combines building & damage detection and provides an app for viewing predictions

Soteria -> uses machine learning with satellite imagery to map natural disaster impacts for faster emergency response

DisasterHack -> Wildfire Mitigation: Computer Vision Identification of Hazard Fuels Using Landsat

forestcasting -> Forest fire prediction powered by analytics

Machine Learning-based Damage Assessment for Disaster Relief on Google AI blog -> uses object detection to locate buildings, then a classifier to determine if a building is damaged. Challenge of generalising due to small dataset

hurricane_damage -> Post-hurricane structure damage assessment based on aerial imagery with CNN

rescue -> code of the paper: Attention to fires: multi-channel deep-learning models forwildfire severity prediction

-. Disaster-Classification -> A disaster classification model to predict the type of disaster given an input image

Coarse-to-fine weakly supervised learning method for green plastic cover segmentation

Detection of destruction in satellite imagery

BDD-Net -> A General Protocol for Mapping Buildings Damaged by a Wide Range of Disasters Based on Satellite Imagery

building-segmentation-disaster-resilience -> 2nd place solution in the Open Cities AI Challenge: Segmenting Buildings for Disaster Resilience

Flooding Damage Detection from Post-Hurricane Satellite Imagery Based on Convolutional Neural Networks

IBM-Disaster-Response-Hack -> identifying optimal terrestrial routes through calamity-stricken areas. Satellite image data informs road condition assessment and obstruction detection

Hurricane-Damage-Detection -> Waterloo's Hack the North 2020++ submission. A convolutional neural network model used to detect hurricane damage in RGB satellite images

Satellite Image Analysis with fast.ai for Disaster Recovery

shackleton -> leverages remote sensing imagery and machine learning techniques to provide insights into various transportation and evacuation scenarios in an interactive dashboard that conducts real-time computation

ai-vegetation-fuel -> Predicting Fuel Load from earth observation data using Machine Learning, using LightGBM & CatBoost

AI Helps Detect Disaster Damage From Satellite Imagery -> NVIDIA blog post

Turkey-Earthquake-2023-Building-Change-Detection -> The repository contains building footprints derived from Maxar open data imagery and change detection results by blackshark-ai

MS4D-Net-Building-Damage-Assessment -> MS4D-Net: Multitask-Based Semi-Supervised Semantic Segmentation Framework with Perturbed Dual Mean Teachers for Building Damage Assessment from High-Resolution Remote Sensing Imagery

DAHiTra -> Large-scale Building Damage Assessment using a Novel Hierarchical Transformer Architecture on Satellite Images. Uses xView2 xBD dataset

skai -> a machine learning based tool from Goolge for performing automatic building damage assessments on aerial imagery of disaster sites.

building-damage-assessment-cnn-siamese -> from the Microsoft Ai for Good lab

Super-resolution

satellite image processing thesis titles

Super-resolution is a technique aimed at improving the resolution of an imaging system. This process can be applied prior to other image processing steps to increase the visibility of small objects or boundaries. Despite its potential benefits, the use of super-resolution is controversial due to the possibility of introducing artifacts that could be mistaken for real features. Super-resolution techniques are broadly categorized into two groups: single image super-resolution (SISR) and multi-image super-resolution (MISR). SISR focuses on enhancing the resolution of a single image, while MISR utilizes multiple images of the same scene to create a high-resolution output. Each approach has its own advantages and limitations, and the choice of method depends on the specific application and desired outcome. Image source .

Multi image super-resolution (MISR)

Note that nearly all the MISR publications resulted from the PROBA-V Super Resolution competition

deepsum -> Deep neural network for Super-resolution of Unregistered Multitemporal images (ESA PROBA-V challenge)

3DWDSRNet -> Satellite Image Multi-Frame Super Resolution (MISR) Using 3D Wide-Activation Neural Networks

RAMS -> Multi-Image Super Resolution of Remotely Sensed Images Using Residual Attention Deep Neural Networks

TR-MISR -> Transformer-based MISR framework for the the PROBA-V super-resolution challenge. With paper

HighRes-net -> Pytorch implementation of HighRes-net, a neural network for multi-frame super-resolution, trained and tested on the European Space Agency’s Kelvin competition

ProbaVref -> Repurposing the Proba-V challenge for reference-aware super resolution

The missing ingredient in deep multi-temporal satellite image super-resolution -> Permutation invariance harnesses the power of ensembles in a single model, with repo piunet

MSTT-STVSR -> Space-time Super-resolution for Satellite Video: A Joint Framework Based on Multi-Scale Spatial-Temporal Transformer, JAG, 2022

Self-Supervised Super-Resolution for Multi-Exposure Push-Frame Satellites

DDRN -> Deep Distillation Recursive Network for Video Satellite Imagery Super-Resolution

- worldstrat -> SISR and MISR implementations of SRCNN

MISR-GRU -> Pytorch implementation of MISR-GRU, a deep neural network for multi image super-resolution (MISR), for ProbaV Super Resolution Competition

MSDTGP -> Satellite Video Super-Resolution via Multiscale Deformable Convolution Alignment and Temporal Grouping Projection

proba-v-super-resolution-challenge -> Solution to ESA's satellite imagery super resolution challenge

PROBA-V-Super-Resolution -> solution using a custom deep learning architecture

satlas-super-resolution -> Satlas Super Resolution: model is an adaptation of ESRGAN, with changes that allow the input to be a time series of Sentinel-2 images.

MISR Remote Sensing SRGAN -> PyTorch SRGAN for RGB Remote Sensing imagery, performing both SISR and MISR. MISR implementation inspired by RecursiveNet (HighResNet). Includes pretrained Checkpoints.

MISR-S2 -> Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series

Single image super-resolution (SISR)

Swin2-MoSE -> Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing

sentinel2_superresolution -> Super-resolution of 10 Sentinel-2 bands to 5-meter resolution, starting from L1C or L2A (Theia format) products. Trained on Sen2Venµs

Super Resolution for Satellite Imagery - srcnn repo

TensorFlow implementation of "Accurate Image Super-Resolution Using Very Deep Convolutional Networks" adapted for working with geospatial data

Random Forest Super-Resolution (RFSR repo) including sample data

Enhancing Sentinel 2 images by combining Deep Image Prior and Decrappify . Repo for deep-image-prior and article on decrappify

Image Super-Resolution using an Efficient Sub-Pixel CNN -> the keras docs have a great tutorial on this light weight but well performing model

super-resolution-using-gan -> Super-Resolution of Sentinel-2 Using Generative Adversarial Networks

Super-resolution of Multispectral Satellite Images Using Convolutional Neural Networks

Multi-temporal Super-Resolution on Sentinel-2 Imagery using HighRes-Net, repo

SSPSR-Pytorch -> A spatial-spectral prior deep network for single hyperspectral image super-resolution

Sentinel-2 Super-Resolution: High Resolution For All (Bands)

CinCGAN -> Unsupervised Image Super-Resolution using Cycle-in-Cycle Generative Adversarial Networks

Satellite-image-SRGAN using PyTorch

EEGAN -> Edge Enhanced GAN For Remote Sensing Image Super-Resolution, TensorFlow 1.1

PECNN -> A Progressively Enhanced Network for Video Satellite Imagery Super-Resolution, minimal documentation

hs-sr-tvtv -> Enhanced Hyperspectral Image Super-Resolution via RGB Fusion and TV-TV Minimization

sr4rs -> Super resolution for remote sensing, with pre-trained model for Sentinel-2, SRGAN-inspired

Restoring old aerial images with Deep Learning -> Medium article on Super Resolution with Perceptual Loss function and real images as input

RFSR_TGRS -> Hyperspectral Image Super-Resolution via Recurrent Feedback Embedding and Spatial-Spectral Consistency Regularization

SEN2VENµS -> a dataset for the training of Sentinel-2 super-resolution algorithms. With paper

TransENet -> Transformer-based Multi-Stage Enhancement for Remote Sensing Image Super-Resolution

SG-FBGAN -> Remote Sensing Image Super-Resolution via Saliency-Guided Feedback GANs

finetune_ESRGAN -> finetune the ESRGAN super resolution generator for remote sensing images and video

MIP -> Unsupervised Remote Sensing Super-Resolution via Migration Image Prior

Optical-RemoteSensing-Image-Resolution -> Deep Memory Connected Neural Network for Optical Remote Sensing Image Restoration. Two applications: Gaussian image denoising and single image super-resolution

HSENet -> Hybrid-Scale Self-Similarity Exploitation for Remote Sensing Image Super-Resolution

SR_RemoteSensing -> Super-Resolution deep learning models for remote sensing data based on BasicSR

RSI-Net -> A Deep Multi-task Convolutional Neural Network for Remote Sensing Image Super-resolution and Colorization

EDSR-Super-Resolution -> EDSR model using PyTorch applied to satellite imagery

CycleCNN -> Nonpairwise-Trained Cycle Convolutional Neural Network for Single Remote Sensing Image Super-Resolution

SISR with with Real-World Degradation Modeling -> Single-Image Super Resolution of Remote Sensing Images with Real-World Degradation Modeling

pixel-smasher -> Super-Resolution Surface Water Mapping on the Canadian Shield Using Planet CubeSat Images and a Generative Adversarial Network

satellite-image-super-resolution -> A Comparative Study on CNN-Based Single-Image Super-Resolution Techniques for Satellite Images

SatelliteSR -> comparison of a number of techniques on the DOTA dataset

Image-Super-Resolution -> Super resolution RESNET network

Unsupervised Super Resolution for Sentinel-2 satellite imagery -> using Deep Image Prior (DIP), Zero-Shot Super Resolution (ΖSSR) & Degradation-Aware Super Resolution (DASR)

Spectral Super-Resolution of Satellite Imagery with Generative Adversarial Networks

Super resolution using GAN / 4x Improvement -> applied to Sentinel 2

rs-esrgan -> RS-ESRGAN: Super-Resolution of Sentinel-2 Imagery Using Generative Adversarial Networks

TS-RSGAN -> Super-Resolution of Remote Sensing Images for ×4 Resolution without Reference Images. Applied to Sentinel-2

CDCR -> Combining Discrete and Continuous Representation: Scale-Arbitrary Super-Resolution for Satellite Images

FunSR -> cContinuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space

HAUNet_RSISR -> Hybrid Attention-Based U-Shaped Network for Remote Sensing Image Super-Resolution

L1BSR -> Exploiting Detector Overlap for Self-Supervised SISR of Sentinel-2 L1B Imagery

Deep-Harmonization -> Deep Learning-based Harmonization and Super-Resolution of Landsat-8 and Sentinel-2 images

SGDM -> Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior

Super-resolution - Miscellaneous

The value of super resolution — real world use case -> Medium article on parcel boundary detection with super-resolved satellite imagery

Super-Resolution on Satellite Imagery using Deep Learning -> Nov 2016 blog post by CosmiQ Works with a nice introduction to the topic. Proposes and demonstrates a new architecture with perturbation layers with practical guidance on the methodology and code . Three part series

Introduction to spatial resolution

Awesome-Super-Resolution -> another 'awesome' repo, getting a little out of date now

Super-Resolution (python) Utilities for managing large satellite images

pytorch-enhance -> Library of Image Super-Resolution Models, Datasets, and Metrics for Benchmarking or Pretrained Use. Also checkout this implementation in Jax

Super Resolution in OpenCV

AI-based Super resolution and change detection to enforce Sentinel-2 systematic usage -> Worldview-2 images (2m) were used to create a reference dataset and increase the spatial resolution of the Copernicus sensor from 10m to 5m

Model-Guided Deep Hyperspectral Image Super-resolution -> code accompanying the paper: Model-Guided Deep Hyperspectral Image Super-Resolution

Super-resolving beyond satellite hardware -> paper assessing SR performance in reconstructing realistically degraded satellite images

satellite-pixel-synthesis-pytorch -> PyTorch implementation of NeurIPS 2021 paper: Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis

SRE-HAN -> Squeeze-and-Residual-Excitation Holistic Attention Network improves super-resolution (SR) on remote-sensing imagery compared to other state-of-the-art attention-based SR models

satsr -> A project to perform super-resolution on multispectral images from any satellite, including Sentinel 2, Landsat 8, VIIRS &MODIS

OLI2MSI -> dataset for remote sensing imagery super-resolution composed of Landsat8-OLI and Sentinel2-MSI images

MMSR -> Learning Mutual Modulation for Self-Supervised Cross-Modal Super-Resolution

HSRnet -> Hyperspectral Image Super-resolution via Deep Spatio-spectral Attention Convolutional Neural Networks

RRSGAN -> RRSGAN: Reference-Based Super-Resolution for Remote Sensing Image

HDR-DSP-SR -> Self-supervised multi-image super-resolution for push-frame satellite images

GAN-HSI-SR -> Hyperspectral Image Super-Resolution by Band Attention Through Adversarial Learning

Restoring old aerial images with Deep Learning -> Medium article Super Resolution with Perceptual Loss function and real images as input

Pansharpening

satellite image processing thesis titles

Pansharpening is a data fusion method that merges the high spatial detail from a high-resolution panchromatic image with the rich spectral information from a lower-resolution multispectral image. The result is a single, high-resolution color image that retains both the sharpness of the panchromatic band and the color information of the multispectral bands. This process enhances the spatial resolution while preserving the spectral qualities of the original images. Image source

Several algorithms described in the ArcGIS docs , with the simplest being taking the mean of the pan and RGB pixel value.

PGCU -> Probability-based Global Cross-modal Upsampling for Pansharpening

rio-pansharpen -> pansharpening Landsat scenes

Simple-Pansharpening-Algorithms

Working-For-Pansharpening -> long list of pansharpening methods and update of Awesome-Pansharpening

PSGAN -> A Generative Adversarial Network for Remote Sensing Image Pan-sharpening

Pansharpening-by-Convolutional-Neural-Network

PBR_filter -> Pansharpening by Background Removal algorithm for sharpening RGB images

py_pansharpening -> multiple algorithms implemented in python

Deep-Learning-PanSharpening -> deep-learning based pan-sharpening code package, we reimplemented include PNN, MSDCNN, PanNet, TFNet, SRPPNN, and our purposed network DIPNet

HyperTransformer -> A Textural and Spectral Feature Fusion Transformer for Pansharpening

DIP-HyperKite -> Hyperspectral Pansharpening Based on Improved Deep Image Prior and Residual Reconstruction

D2TNet -> A ConvLSTM Network with Dual-direction Transfer for Pan-sharpening

PanColorGAN-VHR-Satellite-Images -> Rethinking CNN-Based Pansharpening: Guided Colorization of Panchromatic Images via GANs

MTL_PAN_SEG -> Multi-task deep learning for satellite image pansharpening and segmentation

Z-PNN -> Pansharpening by convolutional neural networks in the full resolution framework

GTP-PNet -> GTP-PNet: A residual learning network based on gradient transformation prior for pansharpening

UDL -> Dynamic Cross Feature Fusion for Remote Sensing Pansharpening

PSData -> A Large-Scale General Pan-sharpening DataSet, which contains PSData3 (QB, GF-2, WV-3) and PSData4 (QB, GF-1, GF-2, WV-2).

AFPN -> Adaptive Detail Injection-Based Feature Pyramid Network For Pan-sharpening

pan-sharpening -> multiple methods demonstrated for multispectral and panchromatic images

PSGan-Family -> PSGAN: A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening

PanNet-Landsat -> A Deep Network Architecture for Pan-Sharpening

DLPan-Toolbox -> Machine Learning in Pansharpening: A Benchmark, from Shallow to Deep Networks

LPPN -> Laplacian pyramid networks: A new approach for multispectral pansharpening

S2_SSC_CNN -> Zero-shot Sentinel-2 Sharpening Using A Symmetric Skipped Connection Convolutional Neural Network

S2S_UCNN -> Sentinel 2 sharpening using a single unsupervised convolutional neural network with MTF-Based degradation model

SSE-Net -> Spatial and Spectral Extraction Network With Adaptive Feature Fusion for Pansharpening

UCGAN -> Unsupervised Cycle-consistent Generative Adversarial Networks for Pan-sharpening

GCPNet -> When Pansharpening Meets Graph Convolution Network and Knowledge Distillation

PanFormer -> PanFormer: a Transformer Based Model for Pan-sharpening

Pansharpening -> Pansformers: Transformer-Based Self-Attention Network for Pansharpening

Sentinel-2 Band Pan-Sharpening

Image-to-image translation

satellite image processing thesis titles

Image-to-image translation is a crucial aspect of computer vision that utilizes machine learning models to transform an input image into a new, distinct output image. In the field of remote sensing, it plays a significant role in bridging the gap between different imaging domains, such as converting Synthetic Aperture Radar (SAR) images into RGB (Red Green Blue) images. This technology has a wide range of applications, including improving image quality, filling in missing information, and facilitating cross-domain image analysis and comparison. By leveraging deep learning algorithms, image-to-image translation has become a powerful tool in the arsenal of remote sensing researchers and practitioners. Image source

How to Develop a Pix2Pix GAN for Image-to-Image Translation -> how to develop a Pix2Pix model for translating satellite photographs to Google map images. A good intro to GANS

A growing problem of ‘deepfake geography’: How AI falsifies satellite images

Kaggle Pix2Pix Maps -> dataset for pix2pix to take a google map satellite photo and build a street map

guided-deep-decoder -> With guided deep decoder, you can solve different image pair fusion problems, allowing super-resolution, pansharpening or denoising

hackathon-ci-2020 -> generate nighttime imagery from infrared observations

satellite-to-satellite-translation -> VAE-GAN architecture for unsupervised image-to-image translation with shared spectral reconstruction loss. Model is trained on GOES-16/17 and Himawari-8 L1B data

Pytorch implementation of UNet for converting aerial satellite images into google maps kinda images

Seamless-Satellite-image-Synthesis -> generate abitrarily large RGB images from a map

How to Develop a Pix2Pix GAN for Image-to-Image Translation -> article on machinelearningmastery.com

Satellite-Imagery-to-Map-Translation-using-Pix2Pix-GAN-framework

RSIT_SRM_ISD -> PyTorch implementation of Remote sensing image translation via style-based recalibration module and improved style discriminator

pix2pix_google_maps -> Converts satellite images to map images using pix2pix models

sar2color-igarss2018-chainer -> Image Translation Between Sar and Optical Imagery with Generative Adversarial Nets

HSI2RGB -> Create realistic looking RGB images using remote sensing hyperspectral images

sat_to_map -> Learning mappings to generate city maps images from corresponding satellite images

pix2pix-GANs -> Generate Map using Satellite Image & PyTorch

map-sat -> Generate Your Own Scotland: Satellite Image Generation Conditioned on Maps

Data fusion

satellite image processing thesis titles

Data fusion is a technique for combining information from different sources such as Synthetic Aperture Radar (SAR), optical imagery, and non-imagery data such as Internet of Things (IoT) sensor data. The integration of diverse data sources enables data fusion to overcome the limitations of individual sources, leading to the creation of models that are more accurate and informative than those constructed from a single source. Image source

Awesome-Data-Fusion-for-Remote-Sensing

UDALN_GRSL -> Deep Unsupervised Blind Hyperspectral and Multispectral Data Fusion

CropTypeMapping -> Crop type mapping from optical and radar (Sentinel-1&2) time series using attention-based deep learning

Multimodal-Remote-Sensing-Toolkit -> uses Hyperspectral and LiDAR Data

Aerial-Template-Matching -> development of an algorithm for template Matching on aerial imagery applied to UAV dataset

DDA_UrbanExtraction -> Unsupervised Domain Adaptation for Global Urban Extraction using Sentinel-1 and Sentinel-2 Data

swinstfm -> Remote Sensing Spatiotemporal Fusion using Swin Transformer

LoveCS -> Cross-sensor domain adaptation for high-spatial resolution urban land-cover mapping: from airborne to spaceborne imagery

comingdowntoearth -> Implementation of 'Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization'

Matching between acoustic and satellite images

MapRepair -> Deep Cadastre Maps Alignment and Temporal Inconsistencies Fix in Satellite Images

Compressive-Sensing-and-Deep-Learning-Framework -> Compressive Sensing is used as an initial guess to combine data from multiple sources, with LSTM used to refine the result

DeepSim -> DeepSIM: GPS Spoofing Detection on UAVs using Satellite Imagery Matching

MHF-net -> Multispectral and Hyperspectral Image Fusion by MS/HS Fusion Net

Remote_Sensing_Image_Fusion -> Semi-Supervised Remote Sensing Image Fusion Using Multi-Scale Conditional Generative Adversarial network with Siamese Structure

CNNs for Multi-Source Remote Sensing Data Fusion -> Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data

Deep Generative Reflectance Fusion -> Achieving Landsat-like reflectance at any date by fusing Landsat and MODIS surface reflectance with deep generative models

IEEE_TGRS_MDL-RS -> More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification

SSRNET -> SSR-NET: Spatial-Spectral Reconstruction Network for Hyperspectral and Multispectral Image Fusion

cross-view-image-matching -> Bridging the Domain Gap for Ground-to-Aerial Image Matching

CoF-MSMG-PCNN -> Remote Sensing Image Fusion via Boundary Measured Dual-Channel PCNN in Multi-Scale Morphological Gradient Domain

robust_matching_network_on_remote_sensing_imagery_pytorch -> A Robust Matching Network for Gradually Estimating Geometric Transformation on Remote Sensing Imagery

edcstfn -> An Enhanced Deep Convolutional Model for Spatiotemporal Image Fusion

ganstfm -> A Flexible Reference-Insensitive Spatiotemporal Fusion Model for Remote Sensing Images Using Conditional Generative Adversarial Network

CMAFF -> Cross-Modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery

SOLC -> MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification. Uses WHU-OPT-SAR-dataset

MFT -> Multimodal Fusion Transformer for Remote Sensing Image Classification

HSHT-Satellite-Imagery-Synthesis -> Improving Flood Maps by Increasing the Temporal Resolution of Satellites Using Hybrid Sensor Fusion

MDC -> Unsupervised Data Fusion With Deeper Perspective: A Novel Multisensor Deep Clustering Algorithm

FusAtNet -> FusAtNet: Dual Attention based SpectroSpatial Multimodal Fusion Network for Hyperspectral and LiDAR Classification

AMM-FuseNet -> Attention-Based Multi-Modal Image Fusion Network for Land Cover Mapping

MANet -> MANet: A Network Architecture for Remote Sensing Spatiotemporal Fusion Based on Multiscale and Attention Mechanisms

deforestation-from-data-fusion -> Fusing Sentinel-1 and Sentinel-2 images for deforestation detection in the Brazilian Amazon under diverse cloud conditions

sct-fusion -> Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

RSI-MMSegmentation -> GAMUS: A Geometry-aware Multi-modal Semantic Segmentation Benchmark for Remote Sensing Data

dfc2022-baseline -> baseline solution to the 2022 IEEE GRSS Data Fusion Contest (DFC2022) using TorchGeo, PyTorch Lightning, and Segmentation Models PyTorch to train a U-Net with a ResNet-18 backbone and a loss function of Focal + Dice loss to perform semantic segmentation on the DFC2022 dataset

multiviewRS-models -> List of multi-view fusion learning models proposed for remote sensing (RS) multi-view data

satellite image processing thesis titles

Generative networks (e.g. GANs) aim to generate new, synthetic data that appears similar to real-world data. This generated data can be used for a wide range of purposes, including data augmentation, data imbalance correction, and filling in missing or corrupted data. Including generating synthetic data can improve the performance of remote sensing algorithms and models, leading to more accurate and reliable results. Image source

Using Generative Adversarial Networks to Address Scarcity of Geospatial Training Data -> GAN perform better than CNN in segmenting land cover classes outside of the training dataset (article, no code)

Building-A-Nets -> robust building extraction from high-resolution remote sensing images with adversarial networks

GANmapper -> a building footprint generator using Generative Adversarial Networks

CSA-CDGAN -> Channel Self-Attention Based Generative Adversarial Network for Change Detection of Remote Sensing Images

DSGAN -> a conditinal GAN for dynamic precipitation downscaling

MarsGAN -> GAN trained on satellite photos of Mars

HC_ADGAN -> codes for the paper Adaptive Dropblock Enhanced GenerativeAdversarial Networks for Hyperspectral Image Classification

SCALAE -> Formatting the Landscape: Spatial conditional GAN for varying population in satellite imagery. Method to generate satellite imagery from custom 2D population maps

Satellite-Image-Forgery-Detection-and-Localization

STGAN -> PyTorch Implementation of STGAN for Cloud Removal in Satellite Images

ds-gan-spatiotemporal-evaluation -> evaluating use of deep generative models in remote sensing applications

GAN-based method to generate high-resolution remote sensing for data augmentation and image classification

Remote-Sensing-Image-Generation -> Generate RS Images using Generative Adversarial Networks (GAN)

PSGan-Family -> A Generative Adversarial Network for Remote Sensing Image Pan-Sharpening

Satellite Image Augmetation with GANs -> Image Augmentation for Satellite Images

opt2sar-cyclegan -> Research on SAR image generation method based on non-homologous data

sentinel-cgan -> code for article : Generative adversarial networks in satellite image datasets augmentation

Shoreline_Extraction_GAN -> Shoreline extraction via generative adversarial networks, prediction via LSTMs

Landsat8-Sentinel2-Fusion -> Translating Landsat 8 to Sentinel-2 using a GAN

Seg2Sat -> Seg2Sat explores the potential of diffusion algorithms such as StableDiffusion and ControlNet to generate aerial images based on terrain segmentation data

SAR2Optical -> Transcoding Sentinel-1 SAR to Sentinel-2 using cGAN

satellite image processing thesis titles

Autoencoders are a type of neural network that aim to simplify the representation of input data by compressing it into a lower dimensional form. This is achieved through a two-step process of encoding and decoding, where the encoding step compresses the data into a lower dimensional representation, and the decoding step restores the data back to its original form. The goal of this process is to reduce the data's dimensionality, making it easier to store and process, while retaining the essential information. Dimensionality reduction, as the name suggests, refers to the process of reducing the number of dimensions in a dataset. This can be achieved through various techniques such as principal component analysis (PCA) or singular value decomposition (SVD). Autoencoders are one type of neural network that can be used for dimensionality reduction. In the field of computer vision, image embeddings are vector representations of images that capture the most important features of the image. These embeddings can then be used to perform similarity searches, where images are compared based on their features to find similar images. This process can be used in a variety of applications, such as image retrieval, where images are searched based on certain criteria like color, texture, or shape. It can also be used to identify duplicate images in a dataset. Image source

Autoencoders & their Application in Remote Sensing -> intro article and example use case applied to SAR data for land classification

LEt-SNE -> Dimensionality Reduction and visualization technique that compensates for the curse of dimensionality

AutoEncoders for Land Cover Classification of Hyperspectral Images -> An autoencoder nerual net is used to reduce 103 band data to 60 features (dimensionality reduction), keras. Also read part 2 which implements K-NNC, SVM and Gradient Boosting

Image-Similarity-Search -> an app that helps perform super fast image retrieval on PyTorch models for better embedding space interpretability

Interactive-TSNE -> a tool that provides a way to visually view a PyTorch model's feature representation for better embedding space interpretability

RoofNet -> identify roof age using historical satellite images to lower the customer acquisition cost for new solar installations. Uses a VAE: Variational Autoencoder

Visual search over billions of aerial and satellite images -> implemented at Descartes labs

parallax -> Tool for interactive embeddings visualization

Mxnet repository for generating embeddings on satellite images -> Includes sampling of images, mining algorithms, different architectures, error functions, measures for evaluation.

Fine tuning CLIP with Remote Sensing (Satellite) images and captions -> fine tuning CLIP on the RSICD image captioning dataset, to enable querying large catalogues in natural language. With repo , uses 🤗

Image search with 🤗 datasets -> tutorial on fine tuning an image search model

GRN-SNDL -> model the relations between samples (or scenes) by making use of a graph structure which is fed into network learning

SauMoCo -> Deep Unsupervised Embedding for Remotely Sensed Images Based on Spatially Augmented Momentum Contrast

TGRS_RiDe -> Rotation Invariant Deep Embedding for RemoteSensing Images

RaVAEn -> RaVAEn is a lightweight, unsupervised approach for change detection in satellite data based on Variational Auto-Encoders (VAEs) with the specific purpose of on-board deployment

Reverse image search using deep discrete feature extraction and locality-sensitive hashing

SNCA_CE -> Deep Metric Learning based on Scalable Neighborhood Components for Remote Sensing Scene Characterization

LandslideDetection-from-satellite-imagery -> Using Attention and Autoencoder boosted CNN

split-brain-remote-sensing -> Analysis of Color Space Quantization in Split-Brain Autoencoder for Remote Sensing Image Classification

image-similarity-measures -> Implementation of eight evaluation metrics to access the similarity between two images. Blog post here

Large_Scale_GeoVisual_Search -> ResNet architecture on UC Merced Land Use Dataset with hamming distance for similarity based search

geobacter -> Generates useful feature embeddings for geospatial locations

Satellite-Image-Segmentation -> the KV-Net model uses this feature of autoencoders to reconnect the disconnected roads

Satellite-Image-Enhancement -> Image enhancement using GAN's and autoencoders

Variational-Autoencoder-For-Satellite-Imagery -> a special VAE to squeeze N images into one single representation with colors segmentating the different objects

DINCAE -> Data-Interpolating Convolutional Auto-Encoder is a neural network to reconstruct missing data in satellite observations

3D_SITS_Clustering -> Unsupervised Satellite Image Time Series Clustering Using Object-Based Approaches and 3D Convolutional Autoencoder

sat_cnn -> Estimating Generalized Measures of Local Neighbourhood Context from Multispectral Satellite Images Using a Convolutional Neural Network. Uses a convolutional autoencoder (CAE)

you-are-here -> You Are Here: Geolocation by Embedding Maps and Images

Tensorflow similarity -> offers state-of-the-art algorithms for metric learning and all the necessary components to research, train, evaluate, and serve similarity-based models

Train SimSiam on Satellite Images using lightly.ai to generate embeddings that can be used for data exploration and understanding

Airbus_SDC_dup -> Project focused on detecting duplicate regions of overlapping satellite imagery. Applied to Airbus ship detection dataset

scale-mae -> Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Cross-Scale-MAE -> code for paper: Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing

satclip -> A Global, General-Purpose Geographic Location Encoder from Microsoft

Astronaut Photography Localization & Iterative Coregistration

rs-cbir -> Satellite Image Vector Database and Multimodal Search using fine-tuned ResNet50 on AID dataset

TorchSpatial -> A Location Encoding Framework and Benchmark for Spatial Representation Learning

Anomaly detection

Anomaly detection refers to the process of identifying unusual patterns or outliers in satellite or aerial images that do not conform to expected norms. This is crucial in applications such as environmental monitoring, defense surveillance, and urban planning. Machine learning algorithms, particularly unsupervised learning methods, are used to analyze vast amounts of remote sensing data efficiently. These algorithms learn the typical patterns and variations in the data, allowing them to flag anomalies such as unexpected land cover changes, illegal deforestation, or unusual maritime activities. The detection of these anomalies can provide valuable insights for timely decision-making and intervention in various fields.

marine-anomaly-detection -> Semantic segmentation of marine anomalies using semi-supervised learning (FixMatch for semantic segmentation) on Sentinel-2 multispectral images

TDD -> One-Step Detection Paradigm for Hyperspectral Anomaly Detection via Spectral Deviation Relationship Learning

anomaly-detection-in-SAR-imagery -> identify an unknown ship in docks using keras & retinanet

pub-ffi-gan -> Applying generative adversarial networks for anomaly detection in hyperspectral remote sensing imagery

How Airbus Detects Anomalies in ISS Telemetry Data Using TFX -> uses an autoencoder

  • AgriSen-COG -> a Multicountry, Multitemporal Large-Scale Sentinel-2 Benchmark Dataset for Crop Mapping: includes an anomaly detection preprocessing step

Image retrieval

satellite image processing thesis titles

Image retrieval is the task of retrieving images from a collection that are similar to a query image. Image retrieval plays a vital role in remote sensing by enabling the efficient and effective search for relevant images from large image archives, and by providing a way to quantify changes in the environment over time. Image source

Demo_AHCL_for_TGRS2022 -> Asymmetric Hash Code Learning (AHCL) for remote sensing image retrieval

GaLR -> Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information

retrievalSystem -> cross-modal image retrieval system

AMFMN -> Exploring a Fine-grained Multiscale Method for Cross-modal Remote Sensing Image Retrieval

Active-Learning-for-Remote-Sensing-Image-Retrieval -> unofficial implementation of paper: A Novel Active Learning Method in Relevance Feedback for Content-Based Remote Sensing Image Retrieval

CMIR-NET -> A deep learning based model for cross-modal retrieval in remote sensing

Deep-Hash-learning-for-Remote-Sensing-Image-Retrieval -> Deep Hash Learning for Remote Sensing Image Retrieval

MHCLN -> Deep Metric and Hash-Code Learning for Content-Based Retrieval of Remote Sensing Images

HydroViet_VOR -> Object Retrieval in satellite images with Triplet Network

AMFMN -> Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval

remote-sensing-image-retrieval -> Multi-Spectral Remote Sensing Image Retrieval using Geospatial Foundation Models (IBM Prithvi)

Composed Image Retrieval for Remote Sensing

CSMAE -> About Cross-Sensor Masked Autoencoder for Content Based Image Retrieval in Remote Sensing

Image Captioning

satellite image processing thesis titles

Image Captioning is the task of automatically generating a textual description of an image. In remote sensing, image captioning can be used to automatically generate captions for satellite or aerial images, which can be useful for a variety of purposes, such as image search and retrieval, data cataloging, and data dissemination. The generated captions can provide valuable information about the content of the images, including the location, the type of terrain or objects present, and the weather conditions, among others. This information can be used to quickly and easily understand the content of the images, without having to manually examine each image. Image source

awesome-remote-image-captioning -> a list of awesome remote sensing image captioning resources

awesome-vision-language-models-for-earth-observation

CapFormer -> Pure transformer for remote sensing image caption

remote_sensing_image_captioning -> Region Driven Remote Sensing Image Captioning

Remote Sensing Image Captioning with Transformer and Multilabel Classification

Siamese-spatial-Graph-Convolution-Network -> Siamese graph convolutional network for content based remote sensing image retrieval

MLAT -> Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer

WordSent -> Word–Sentence Framework for Remote Sensing Image Captioning

a-mask-guided-transformer-with-topic-token -> A Mask-Guided Transformer Network with Topic Token for Remote Sensing Image Captioning

Meta captioning -> A meta learning based remote sensing image captioning framework

Transformer-for-image-captioning -> a transformer for image captioning, trained on the UCM dataset

remote-sensing-image-caption -> image classification and image caption by PyTorch

Fine tuning CLIP with Remote Sensing (Satellite) images and captions -> fine tuning CLIP on the RSICD image captioning dataset, to enable querying large catalogues in natural language. With repo , uses 🤗. Also read Why and How to Fine-tune CLIP

Visual Question Answering

Visual Question Answering (VQA) is the task of automatically answering a natural language question about an image. In remote sensing, VQA enables users to interact with the images and retrieve information using natural language questions. For example, a user could ask a VQA system questions such as "What is the type of land cover in this area?", "What is the dominant crop in this region?" or "What is the size of the city in this image?". The system would then analyze the image and generate an answer based on its understanding of the image content.

VQA-easy2hard -> From Easy to Hard: Learning Language-guided Curriculum for Visual Question Answering on Remote Sensing Data

lit4rsvqa -> LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing

Change-Agent -> Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis

Mixed data learning

Mixed data learning is the process of learning from datasets that may contain an mix of images, textual and numeric data. Mixed data learning can help improve the accuracy of models by allowing them to learn from multiple sources at once and use more sophisticated methods to identify patterns and correlations.

Predicting the locations of traffic accidents with satellite imagery and convolutional neural networks -> Combining satellite imagery and structured data to predict the location of traffic accidents with a neural network of neural networks, with repo

Multi-Input Deep Neural Networks with PyTorch-Lightning - Combine Image and Tabular Data -> excellent intro article using pytorch, not actually applied to satellite data but to real estate data, with repo

Joint Learning from Earth Observation and OpenStreetMap Data to Get Faster Better Semantic Maps -> fusion based architectures and coarse-to-fine segmentation to include the OpenStreetMap layer into multispectral-based deep fully convolutional networks, arxiv paper

pyimagesearch article on mixed-data

pytorch-widedeep -> A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

accidentRiskMap -> Inferring high-resolution traffic accident risk maps based on satellite imagery and GPS trajectories

Sub-meter resolution canopy height map by Meta -> Satellite Metadata combined with outputs from simple CNN to regress canopy height

This is a class of techniques which attempt to make predictions for classes with few, one or even zero examples provided during training. In zero shot learning (ZSL) the model is assisted by the provision of auxiliary information which typically consists of descriptions/semantic attributes/word embeddings for both the seen and unseen classes at train time ( ref ). These approaches are particularly relevant to remote sensing, where there may be many examples of common classes, but few or even zero examples for other classes of interest.

Aerial-SAM -> Zero-Shot Refinement of Buildings’ Segmentation Models using SAM

FSODM -> Few-shot Object Detection on Remote Sensing Images

Few-Shot Classification of Aerial Scene Images via Meta-Learning -> 2020 publication, a classification model that can quickly adapt to unseen categories using only a few labeled samples

Papers about Few-shot Learning / Meta-Learning on Remote Sensing

SPNet -> Siamese-Prototype Network for Few-Shot Remote Sensing Image Scene Classification

MDL4OW -> Few-Shot Hyperspectral Image Classification With Unknown Classes Using Multitask Deep Learning

P-CNN -> Prototype-CNN for Few-Shot Object Detection in Remote Sensing Images

CIR-FSD-2022 -> Context Information Refinement for Few-Shot Object Detection in Remote Sensing Images

IEEE_TNNLS_Gia-CFSL -> Graph Information Aggregation Cross-Domain Few-Shot Learning for Hyperspectral Image Classification

TIP_2022_CMFSL -> Few-shot Learning with Class-Covariance Metric for Hyperspectral Image Classification

sen12ms-human-few-shot-classifier -> Humans are poor few-shot classifiers for Sentinel-2 land cover

S3Net -> S3Net: Spectral–Spatial Siamese Network for Few-Shot Hyperspectral Image Classification

SiameseNet-for-few-shot-Hyperspectral-Classification -> 3DCSN:SiameseNet-for-few-shot-Hyperspectral-Classification

MESSL -> Multiform Ensemble Self-Supervised Learning for Few-Shot Remote Sensing Scene Classification

SCCNet -> Self-Correlation and Cross-Correlation Learning for Few-Shot Remote Sensing Image Semantic Segmentation

OEM-Fewshot-Challenge -> OpenEarthMap Land Cover Mapping Few-Shot Challenge Generalized Few-shot Semantic Segmentation

meteor -> a small deep learning meta-model with a single output

SegLand -> Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework. 1st place in the OpenEarthMap Land Cover Mapping Few-Shot Challenge

Self-supervised, unsupervised & contrastive learning are all methods of machine learning that use unlabeled data to train algorithms. Self-supervised learning uses labeled data to create an artificial supervisor, while unsupervised learning uses only the data itself to identify patterns and similarities. Contrastive learning uses pairs of data points to learn representations of data, usually for classification tasks. Note that self-supervised approaches are commonly used in the training of so-called Foundational models, since they enable learning from large quantities of unlablleded data, tyipcally time series.

Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data -> Seasonal Contrast (SeCo) is an effective pipeline to leverage unlabeled data for in-domain pre-training of remote sensing representations. Models trained with SeCo achieve better performance than their ImageNet pre-trained counterparts and state-of-the-art self-supervised learning methods on multiple downstream tasks. paper and repo

Unsupervised Learning for Land Cover Classification in Satellite Imagery

Tile2Vec: Unsupervised representation learning for spatially distributed data

Contrastive Sensor Fusion -> Code implementing Contrastive Sensor Fusion, an approach for unsupervised learning of multi-sensor representations targeted at remote sensing imagery

hyperspectral-autoencoders -> Tools for training and using unsupervised autoencoders and supervised deep learning classifiers for hyperspectral data, built on tensorflow. Autoencoders are unsupervised neural networks that are useful for a range of applications such as unsupervised feature learning and dimensionality reduction.

Sentinel-2 image clustering in python

MARTA GANs: Unsupervised Representation Learning for Remote Sensing Image Classification

A generalizable and accessible approach to machine learning with global satellite imagery nature publication -> MOSAIKS is designed to solve an unlimited number of tasks at planet-scale quickly using feature vectors, with repo . Also see mosaiks-api

contrastive-satellite -> Using contrastive learning to create embeddings from optical EuroSAT Satellite-2 imagery

Self-Supervised Learning of Remote Sensing Scene Representations Using Contrastive Multiview Coding

Self-Supervised-Learner by spaceml-org -> train a classifier with fewer labeled examples needed using self-supervised learning, example applied to UC Merced land use dataset

deepsentinel -> a sentinel-1 and -2 self-supervised sensor fusion model for general purpose semantic embedding

geography-aware-ssl -> uses spatially aligned images over time to construct temporal positive pairs in contrastive learning and geo-location to design pre-text tasks

CNN-Supervised Classification -> Python code for self-supervised classification of remotely sensed imagery - part of the Deep Riverscapes project

clustimage -> a python package for unsupervised clustering of images

LandSurfaceClustering -> Land surface classification using remote sensing data with unsupervised machine learning (k-means)

K-Means Clustering for Surface Segmentation of Satellite Images

Sentinel-2 satellite imagery for crop classification using unsupervised clustering -> label groups of pixels based on temporal trends of their NDVI values

TheColorOutOfSpace -> The color out of space: learning self-supervised representations for Earth Observation imagery, using the BigEarthNet dataset

STEGO -> Unsupervised Semantic Segmentation by Distilling Feature Correspondences, with paper

SoundingEarth -> Self-supervised Audiovisual Representation Learning for Remote Sensing Data, uses the SoundingEarth Dataset

SSLRemoteSensing -> Semantic Segmentation of Remote Sensing Images With Self-Supervised Multitask Representation Learning

CBT -> Continual Barlow Twins: continual self-supervised learning for remote sensing semantic segmentation

Unsupervised Satellite Image Classification based on Partial Adversarial Domain Adaptation -> Code for course project

T2FTS -> Teaching Teachers First and Then Student: Hierarchical Distillation to Improve Long-Tailed Object Recognition in Aerial Images

DINO-MM -> Self-supervised Vision Transformers for Joint SAR-optical Representation Learning

SSL4EO-S12 -> a large-scale dataset for self-supervised learning in Earth observation

SSL4EO-Review -> Self-supervised Learning in Remote Sensing: A Review

transfer_learning_cspt -> Consecutive Pretraining: A Knowledge Transfer Learning Strategy with Relevant Unlabeled Data for Remote Sensing Domain

OTL -> Clustering-Based Representation Learning through Output Translation and Its Application to Remote-Sensing Images

vissl_experiments -> Self-supervised Learning using Facebook VISSL on the RESISC-45 satellite imagery classification dataset

MS2A-Net -> MS 2 A-Net: Multi-scale spectral-spatial association network for hyperspectral image clustering

pytorch-ssl-building_extract -> Research on Self-Supervised Building Information Extraction with High-Resolution Remote Sensing Images for Photovoltaic Potential Evaluation

self-rare-wildlife -> Self-Supervised Pretraining and Controlled Augmentation Improve Rare Wildlife Recognition in UAV Images

SatMAE -> SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

FireCLR-Wildfires -> Unsupervised Wildfire Change Detection based on Contrastive Learning

FALSE -> False Negative Samples Aware Contrastive Learning for Semantic Segmentation of High-Resolution Remote Sensing Image

MATTER -> Self-Supervised Material and Texture Representation Learning for Remote Sensing Tasks

FGMAE -> Feature guided masked Autoencoder for self-supervised learning in remote sensing

GFM -> Towards Geospatial Foundation Models via Continual Pretraining

SatViT -> self-supervised training of multispectral optical and SAR vision transformers

SITS-MoCo -> Self-supervised pre-training for large-scale crop mapping using Sentinel-2 time series

Weakly & semi-supervised learning

Weakly & semi-supervised learning are two methods of machine learning that use both labeled and unlabeled data for training. Weakly supervised learning uses weakly labeled data, which may be incomplete or inaccurate, while semi-supervised learning uses both labeled and unlabeled data. Weakly supervised learning is typically used in situations where labeled data is scarce and unlabeled data is abundant. Semi-supervised learning is typically used in situations where labeled data is abundant but also contains some noise or errors. Both techniques can be used to improve the accuracy of machine learning models by making use of additional data sources.

MARE -> self-supervised Multi-Attention REsu-net for semantic segmentation in remote sensing

SSGF-for-HRRS-scene-classification -> A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification

SFGAN -> Semantic-Fusion Gans for Semi-Supervised Satellite Image Classification

SSDAN -> Multi-Source Semi-Supervised Domain Adaptation Network for Remote Sensing Scene Classification

HR-S2DML -> High-Rankness Regularized Semi-Supervised Deep Metric Learning for Remote Sensing Imagery

Semantic Segmentation of Satellite Images Using Point Supervision

weak-segmentation -> Weakly supervised semantic segmentation for aerial images in pytorch

weakly_supervised -> Weakly Supervised Deep Learning for Segmentation of Remote Sensing Imagery. Demonstrates that segmentation can be performed using small datasets comprised of pixel or image labels

wan -> Weakly-Supervised Domain Adaptation for Built-up Region Segmentation in Aerial and Satellite Imagery

sourcerer -> A Bayesian-inspired deep learning method for semi-supervised domain adaptation designed for land cover mapping from satellite image time series (SITS)

MSMatch -> Semi-Supervised Multispectral Scene Classification with Few Labels. Includes code to work with both the RGB and the multispectral (MS) versions of EuroSAT dataset and the UC Merced Land Use (UCM) dataset

Flood Segmentation on Sentinel-1 SAR Imagery with Semi-Supervised Learning

Semi-supervised learning in satellite image classification -> experimenting with MixMatch and the EuroSAT data set

ICSS -> Weakly-supervised continual learning for class-incremental segmentation

es-CP -> Semi-Supervised Hyperspectral Image Classification Using a Probabilistic Pseudo-Label Generation Framework

Flood_Mapping_SSL -> Enhancement of Urban Floodwater Mapping From Aerial Imagery With Dense Shadows via Semisupervised Learning

Active learning

Supervised deep learning techniques typically require a huge number of annotated/labelled examples to provide a training dataset. However labelling at scale take significant time, expertise and resources. Active learning techniques aim to reduce the total amount of annotation that needs to be performed by selecting the most useful images to label from a large pool of unlabelled images, thus reducing the time to generate useful training datasets. These processes may be referred to as Human-in-the-Loop Machine Learning

Active learning for object detection in high-resolution satellite images

AIDE V2 - Tools for detecting wildlife in aerial images using active learning

AstronomicAL -> An interactive dashboard for visualisation, integration and classification of data using Active Learning

Follow tutorials for active learning for object detection and segmentation on the lightly platform.

Active-Labeler by spaceml-org -> a CLI Tool that facilitates labeling datasets with just a SINGLE line of code

Labelling platform for Mapping Africa active learning project

ALS4GAN -> Active Learning for Improved Semi Supervised Semantic Segmentation in Satellite Images

DIAL -> DIAL: Deep Interactive and Active Learning for Semantic Segmentation in Remote Sensing

whales -> An active learning pipeline for identifying whales in high-resolution satellite imagery, by Microsoft

AL4EO -> a QGIS plug-in to run Active Learning techniques on Earth observation data

Federated learning

Federated learning is an approach to distributed machine learning where a central processor coordinates the training of an individual model in each of its clients. It is a type of distributed ML which means that the data is distributed among different devices or locations and the model is trained on all of them. The central processor aggregates the model updates from all the clients and then sends the global model parameters back to the clients. This is done to protect the privacy of data, as the data remains on the local device and only the global model parameters are shared with the central processor. This technique can be used to train models with large datasets that cannot be stored in a single device, as well as to enable certain privacy-preserving applications.

Federated-Learning-for-Remote-Sensing -> implementation of three Federated Learning models

Semantic-Segmentation-UNet-Federated -> FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views

MM-FL -> Learning Across Decentralized Multi-Modal Remote Sensing Archives with Federated Learning

Adversarial ML

Efforts to detect falsified images & deepfakes

UAE-RS -> dataset that provides black-box adversarial samples in the remote sensing field

PSGAN -> Perturbation Seeking Generative Adversarial Networks: A Defense Framework for Remote Sensing Image Scene Classification

SACNet -> Self-Attention Context Network: Addressing the Threat of Adversarial Attacks for Hyperspectral Image Classification

Image registration

Image registration is the process of registering one or more images onto another (typically well georeferenced) image. Traditionally this is performed manually by identifying control points (tie-points) in the images, for example using QGIS. This section lists approaches which mostly aim to automate this manual process. There is some overlap with the data fusion section but the distinction I make is that image registration is performed as a prerequisite to downstream processes which will use the registered data as an input.

Wikipedia article on registration -> register for change detection or image stitching

Phase correlation is used to estimate the XY translation between two images with sub-pixel accuracy. Can be used for accurate registration of low resolution imagery onto high resolution imagery, or to register a sub-image on a full image -> Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects. With additional pre-processing image rotation and scale changes can also be calculated.

How to Co-Register Temporal Stacks of Satellite Images

image-matching-models -> easily try 23 different image matching methods

ImageRegistration -> Interview assignment for multimodal image registration using SIFT

imreg_dft -> Image registration using discrete Fourier transform. Given two images it can calculate the difference between scale, rotation and position of imaged features.

arosics -> Perform automatic subpixel co-registration of two satellite image datasets using phase-correlation, XY translations only.

SubpixelAlignment -> Implementation of tiff image alignment through phase correlation for pixel- and subpixel-bias

cnn-registration -> A image registration method using convolutional neural network features written in Python2, Tensorflow 1.5

Siamese_ShiftNet -> NN predicting spatial coregistration shift of remote sensing imagery. Adapted from HighRes-net

ImageCoregistration -> Image registration with openCV using sift and RANSAC

mapalignment -> Aligning and Updating Cadaster Maps with Remote Sensing Images

CVPR21-Deep-Lucas-Kanade-Homography -> deep learning pipeline to accurately align challenging multimodality images. The method is based on traditional Lucas-Kanade algorithm with feature maps extracted by deep neural networks.

eolearn implements phase correlation, feature matching and ECC

Reprojecting the Perseverance landing footage onto satellite imagery

Kornia provides image registration

LoFTR -> Detector-Free Local Feature Matching with Transformers. Good performance matching satellite image pairs, tryout the web demo on your data

image-to-db-registration -> This remote module implements an algorithm for automated vector Database registration onto an Image. Implemented in the orfeo-toolbox

MS_HLMO_registration -> Multi-scale Histogram of Local Main Orientation for Remote Sensing Image Registration, with paper

cnn-matching -> Deep learning algorithm for feature matching of cross modality remote sensing images

Imatch-P -> A demo using SuperGlue and SuperPoint to do the image matching task based PaddlePaddle

NBR-Net -> A Non-rigid Bi-directional Registration Network for Multi-temporal Remote Sensing Images

MU-Net -> A Multi-Scale Framework with Unsupervised Learning for Remote Sensing Image Registration

unsupervisedDeepHomographyRAL2018 -> Unsupervised Deep Homography applied to aerial data

registration_cnn_ntg -> A Multispectral Image Registration Method Based on Unsupervised Learning

remote-sensing-images-registration-dataset -> at 0.23m, 3.75m & 30m resolution

semantic-template-matching -> A deep learning semantic template matching framework for remote sensing image registration

GMN-Generative-Matching-Network -> Deep Generative Matching Network for Optical and SAR Image Registration

SOMatch -> A deep learning framework for matching of SAR and optical imagery

Interspectral image registration dataset -> including satellite and drone imagery

RISG-image-matching -> A rotation invariant SuperGlue image matching algorithm

DeepAerialMatching_pytorch -> A Two-Stream Symmetric Network with Bidirectional Ensemble for Aerial Image Matching

DPCN -> Deep Phase Correlation for End-to-End Heterogeneous Sensor Measurements Matching

FSRA -> A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

IHN -> Iterative Deep Homography Estimation

OSMNet -> Explore Better Network Framework for High-Resolution Optical and SAR Image Matching

L2_Siamese -> Registration of Multiresolution Remote Sensing Images Based on L2-Siamese Model

Multi-Step-Deformable-Registration -> Unsupervised Multi-Step Deformable Registration of Remote Sensing Imagery based on Deep Learning

Terrain mapping, Disparity Estimation, Lidar, DEMs & NeRF

Measure surface contours & locate 3D points in space from 2D images. NeRF stands for Neural Radiance Fields and is the term used in deep learning communities to describe a model that generates views of complex 3D scenes based on a partial set of 2D images

Wikipedia DEM article and phase correlation article

Intro to depth from stereo

Map terrain from stereo images to produce a digital elevation model (DEM) -> high resolution & paired images required, typically 0.3 m, e.g. Worldview

Process of creating a DEM here

ArcGIS can generate DEMs from stereo images

S2P -> S2P is a Python library and command line tool that implements a stereo pipeline which produces elevation models from images taken by high resolution optical satellites such as Pléiades, WorldView, QuickBird, Spot or Ikonos.

Predict the fate of glaciers

monodepth - Unsupervised single image depth prediction with CNNs

Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches

Terrain and hydrological analysis based on LiDAR-derived digital elevation models (DEM) - Python package

Phase correlation in scikit-image

3DCD -> Inferring 3D change detection from bitemporal optical images

The Mapbox API provides images and elevation maps, article here

Reconstructing 3D buildings from aerial LiDAR with Mask R-CNN

ResDepth -> A Deep Prior For 3D Reconstruction From High-resolution Satellite Images

overhead-geopose-challenge -> competition to build computer vision algorithms that can effectively model the height and pose of ground objects for monocular satellite images taken from oblique angles. Blog post MEET THE WINNERS OF THE OVERHEAD GEOPOSE CHALLENGE

cars -> a dedicated and open source 3D tool to produce Digital Surface Models from satellite imaging by photogrammetry. This Multiview stereo pipeline is intended for massive DSM production with a robust and performant design

ImageToDEM -> Generating Elevation Surface from a Single RGB Remotely Sensed Image Using a U-Net for generator and a PatchGAN for the discriminator

IMELE -> Building Height Estimation from Single-View Aerial Imagery

ridges -> deep semantic segmentation model for identifying ridges in topography

planet_tools -> Selection of imagery from Planet API for creation of stereo elevation models

SatelliteNeRF -> PyTorch-based Neural Radiance Fields adapted to satellite domain

SatelliteSfM -> A library for solving the satellite structure from motion problem

SatelliteSurfaceReconstruction -> 3D Surface Reconstruction From Multi-Date Satellite Images, ISPRS, 2021

son2sat -> A neural network coded in TensorFlow 1 that produces satellite images from acoustic images

aerial_mtl -> PyTorch implementation for multi-task learning with aerial images to learn both semantics and height from aerial image datasets; fuses RGB & lidar

ReKlaSat-3D -> 3D Reconstruction and Classification from Very High Resolution Satellite Imagery

M3Net -> A deep learning method for building height estimation using high-resolution multi-view imagery over urban areas

HMSM-Net -> Hierarchical multi-scale matching network for disparity estimation of high-resolution satellite stereo images

StereoMatchingRemoteSensing -> Dual-Scale Matching Network for Disparity Estimation of High-Resolution Remote Sensing Images

satnerf -> Learning Multi-View Satellite Photogrammetry With Transient Objects and Shadow Modeling Using RPC Cameras

SatMVS -> Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching

ImpliCity -> reconstructs digital surface models (DSMs) from raw photogrammetric 3D point clouds and ortho-images with the help of an implicit neural 3D scene representation

WHU-Stereo -> a large-scale dataset for stereo matching of high-resolution satellite imagery & several deep learning methods for stereo matching. Methods include StereoNet, Pyramid Stereo Matching Network & HMSM-Net

Photogrammetry-Guide -> A guide covering Photogrammetry including the applications, libraries and tools that will make you a better and more efficient Photogrammetry development

DSM-to-DTM -> Exploring the use of machine learning to convert a Digital Surface Model (e.g. SRTM) to a Digital Terrain Model

GF-7_Stereo_Matching -> Large Scene DSM Generation of Gaofen-7 Imagery Combined with Deep Learning

Mapping drainage ditches in forested landscapes using deep learning and aerial laser scanning

Thermal Infrared

Thermal infrared remote sensing is a technique used to detect and measure thermal radiation emitted from the Earth’s surface. This technique can be used to measure the temperature of the ground and any objects on it and can detect the presence of different materials. Thermal infrared remote sensing is used to assess land cover, detect land-use changes, and monitor urban heat islands, as well as to measure the temperature of the ground during nighttime or in areas of limited visibility.

The World Needs (a lot) More Thermal Infrared Data from Space

Object_Classification_in_Thermal_Images -> classification accuracy was improved by adding the object size as a feature directly within the CNN

Thermal imaging with satellites blog post by Christoph Rieke

SAR (synthetic aperture radar) is used to detect and measure the properties of objects and surfaces on the Earth's surface. SAR can be used to detect changes in terrain, features, and objects over time, as well as to measure the size, shape, and composition of objects and surfaces. SAR can also be used to measure moisture levels in soil and vegetation, or to detect and monitor changes in land use.

awesome-sar

awesome-sar-deep-learning

MERLIN -> self-supervised training of deep despeckling networks with MERLIN

You do not need clean images for SAR despeckling with deep learning -> How Speckle2Void learned to stop worrying and love the noise

PySAR - InSAR (Interferometric Synthetic Aperture Radar) timeseries analysis in python

Synthetic Aperture Radar (SAR) Analysis With Clarifai

Labeled SAR imagery dataset of ten geophysical phenomena from Sentinel-1 wave mode consists of more than 37,000 SAR vignettes divided into ten defined geophysical categories

Deep Learning and SAR Applications

Implementing an Ensemble Convolutional Neural Network on Sentinel-1 Synthetic Aperture Radar data and Sentinel-3 Radiometric data for the detecting of forest fires

s1_parking_occupancy -> PARKING OCCUPANCY ESTIMATION ON SENTINEL-1 IMAGES

Experiments on Flood Segmentation on Sentinel-1 SAR Imagery with Cyclical Pseudo Labeling and Noisy Student Training

SpaceNet_SAR_Buildings_Solutions -> The winning solutions for the SpaceNet 6 Challenge

Mapping and monitoring of infrastructure in desert regions with Sentinel-1

xView3 is a competition to detect dark vessels using computer vision and global SAR satellite imagery. First place solution and second place solution . Additional places up to fifth place are available at the xView GitHub Organization page

Winners of the STAC Overflow: Map Floodwater from Radar Imagery competition

deSpeckNet-TF-GEE -> deSpeckNet: Generalizing Deep Learning Based SAR Image Despeckling

cnn_sar_image_classification -> CNN for classifying SAR images of the Amazon Rainforest

s1_icetype_cnn -> Retrieve sea ice type from Sentinel-1 SAR with CNN

MP-ResNet -> Multi-path Residual Network for the Semantic segmentation of PolSAR Images'

TGRS_DisOptNet -> Distilling Semantic Knowledge from Optical Images for Weather-independent Building Segmentation

SAR_CD_DDNet -> PyTorch implementation of Change Detection in Synthetic Aperture Radar Images Using a Dual Domain Network

SAR_CD_MS_CapsNet -> Change Detection in SAR Images Based on Multiscale Capsule Network

Toushka Waterbodies Segmentation from four different combinations of Sentinel-1 SAR imagery and Digital Elevation Model with Pytorch and U-net. -> code

sar_transformer -> Transformer based SAR image despeckling, trained with synthetic imagery, with paper

SSDD ship detection dataset

Ship Detection on Remote Sensing Synthetic Aperture Radar Data -> based on the architectures of the Faster-RCNN and YOLOv5 networks

Target Recognition in SAR -> Identify Military Vehicles in Satellite Imagery with TensorFlow, with article

DSN -> Deep SAR-Net: Learning objects from signals

SAR_denoising -> project on application of FFDNet to SAR images

cnninsar -> CNN-Based InSAR Denoising and Coherence Metric

sar -> Despeckling Synthetic Aperture Radar Images using a Deep Residual CNN

GCBANet -> A Global Context Boundary-Aware Network for SAR Ship Instance Segmentation

SAR_CD_GKSNet -> Change Detection from Synthetic Aperture Radar Images via Graph-Based Knowledge Supplement Network

pixel-wise-segmentation-of-sar -> Pixel-Wise Segmentation of SAR Imagery Using Encoder-Decoder Network and Fully-Connected CRF

SAR_Ship_detection_CFAR -> An improved two-parameter CFAR algorithm based on Rayleigh distribution and Mathematical Morphology for SAR ship detection

sar_snow_melt_timing -> notebooks and tools to identify snowmelt timing using timeseries analysis of backscatter of Sentinel-1 C-band SAR

Denoising radar satellite images using deep learning in Python -> Medium article on deepdespeckling

random-wetlands -> Random forest classification for wetland vegetation from synthetic aperture radar dataset

AGSDNet -> AGSDNet: Attention and Gradient-Based SAR Denoising Network

LFG-Net -> LFG-Net: Low-Level Feature Guided Network for Precise Ship Instance Segmentation in SAR Images

sar_sift -> Image registration algorithm

SAR-Despeckling -> toolbox

cogsima2022 -> Enhancing land subsidence awareness via InSAR data and Deep Transformers

XAI4SAR-PGIL -> Physically Explainable CNN for SAR Image Classification

PolSARFormer -> Local Window Attention Transformer for Polarimetric SAR Image Classification

DC4Flood -> A deep clustering framework for rapid flood detection using Sentinel-1 SAR imagery

Sentinel1-Flood-Finder -> Flood Finder Package from Sentinel 1 Imagery

NDVI - vegetation index

Normalized Difference Vegetation Index (NDVI) is an index used to measure the amount of healthy vegetation in a given area. It is calculated by taking the difference between the near-infrared (NIR) and red (red) bands of a satellite image, and dividing by the sum of the two bands. NDVI can be used to identify areas of healthy vegetation and to assess the health of vegetation in a given area. ndvi = np.true_divide((ir - r), (ir + r))

Example notebook local

Landsat data in cloud optimised (COG) format analysed for NDVI with medium article here .

Identifying Buildings in Satellite Images with Machine Learning and Quilt -> NDVI & edge detection via gaussian blur as features, fed to TPOT for training with labels from OpenStreetMap, modelled as a two class problem, “Buildings” and “Nature”

Seeing Through the Clouds - Predicting Vegetation Indices Using SAR

A walkthrough on calculating NDWI water index for flooded areas -> Derive zonal statistics from Sentinel 2 images using Rasterio and Geopandas

NDVI-Net -> NDVI-Net: A fusion network for generating high-resolution normalized difference vegetation index in remote sensing

Awesome-Vegetation-Index

Remote-Sensing-Indices-Derivation-Tool -> Calculate spectral remote sensing indices from satellite imagery

General image quality

Image quality describes the degree of accuracy with which an image can represent the original object. Image quality is typically measured by the amount of detail, sharpness, and contrast that an image contains. Factors that contribute to image quality include the resolution, format, and compression of the image.

lvrnet -> Lightweight Image Restoration for Aerial Images under Low Visibility

jitter-compensation -> Remote Sensing Image Jitter Detection and Compensation Using CNN

DeblurGANv2 -> Deblurring (Orders-of-Magnitude) Faster and Better

image-quality-assessment -> CNN to predict the aesthetic and technical quality of images

piq -> a collection of measures and metrics for image quality assessment

FFA-Net -> Feature Fusion Attention Network for Single Image Dehazing

DeepCalib -> A Deep Learning Approach for Automatic Intrinsic Calibration of Wide Field-of-View Cameras

PerceptualSimilarity -> LPIPS is a perceptual metric which aims to overcome the limitations of traditional metrics such as PSNR & SSIM, to better represent the features the human eye picks up on

Hyperspectral-Deblurring-and-Destriping

HyDe -> Hyperspectral Denoising algorithm toolbox in Python

HLF-DIP -> Unsupervised Hyperspectral Denoising Based on Deep Image Prior and Least Favorable Distribution

RQUNetVAE -> Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising

deep-hs-prior -> Deep Hyperspectral Prior: Denoising, Inpainting, Super-Resolution

iquaflow -> from Satellogic, an image quality framework that aims at providing a set of tools to assess image quality by using the performance of AI models trained on the images as a proxy.

Synthetic data

Training data can be hard to acquire, particularly for rare events such as change detection after disasters, or imagery of rare classes of objects. In these situations, generating synthetic training data might be the only option. This has become quite sophisticated, with 3D models being use with open source games engines such as Unreal .

The Synthinel-1 dataset: a collection of high resolution synthetic overhead imagery for building segmentation with repo

RarePlanes -> incorporates both real and synthetically generated satellite imagery including aircraft. Read the arxiv paper and checkout this repo . Note the dataset is available through the AWS Open-Data Program for free download

Read this article from NVIDIA which discusses fine tuning a model pre-trained on synthetic data (Rareplanes) with 10% real data, then pruning the model to reduce its size, before quantizing the model to improve inference speed

Combining Synthetic Data with Real Data to Improve Detection Results in Satellite Imagery

BlenderGIS could be used for synthetic data generation

bifrost.ai -> simulated data service with geospatial output data formats

oktal-se -> software for generating simulated data across a wide range of bands including optical and SAR

rendered.ai -> The Platform as a Service for Creating Synthetic Data

synthetic_xview_airplanes -> creation of airplanes synthetic dataset using ArcGIS CityEngine

Combining Synthetic Data with Real Data to Improve Detection Results in Satellite Imagery: Case Study

Import OpenStreetMap data into Unreal Engine 4

deepfake-satellite-images -> dataset that includes over 1M images of synthetic aerial images

synthetic-disaster -> Generate synthetic satellite images of natural disasters using deep neural networks

STPLS3D -> A Large-Scale Synthetic and Real Aerial Photogrammetry 3D Point Cloud Dataset

LESS -> LargE-Scale remote sensing data and image Simulation framework over heterogeneous 3D scenes

Synthesizing Robustness: Dataset Size Requirements and Geographic Insights -> Medium article, concludes that synthetic data is most beneficial to the rarest object classes and that extracting utility from synthetic data often takes significant effort and creativity

rs_img_synth -> Synthesizing Optical and SAR Imagery From Land Cover Maps and Auxiliary Raster Data

OnlyPlanes -> dataset and pretrained models for the paper: OnlyPlanes - Incrementally Tuning Synthetic Training Datasets for Satellite Object Detection

awesome-remote-sensing-vision-language-models

Awesome-Remote-Sensing-Multimodal-Large-Language-Model

Remote-Sensing-ChatGPT -> an open source tool for solving remote sensing tasks with ChatGPT in an interactive way.

ChangeCLIP -> ChangeCLIP: Remote sensing change detection with multimodal vision-language representation learning

SkyEyeGPT -> SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

RemoteCLIP -> A Vision Language Foundation Model for Remote Sensing

GeoChat -> Grounded Large Vision-Language Model for Remote Sensing

labs-gpt-stac -> connect ChatGPT to a STAC API backend

EarthGPT -> A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

H2RSVLM -> Towards Helpful and Honest Remote Sensing Large Vision Language Model

LLMs & FMs in Smart Agriculture -> Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges

LHRS-Bot -> Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model

Awesome-VLGFM -> Towards Vision-Language Geo-Foundation Models: A Survey

Awesome Remote Sensing Foundation Models

Clay Foundation Model -> an open source AI model and interface for Earth.

TerraTorch -> a Python toolkit for fine-tuning Geospatial Foundation Models from IBM, based on PyTorch Lightning and TorchGeo

EarthPT -> A time series foundation model for Earth Observation

SpectralGPT -> Spectral remote sensing foundation model, with finetuning on classification, segmentation, and change detection tasks

DOFA-pytorch -> Dynamic One-For-All (DOFA) multimodal foundation models for Earth vision reference implementation

Prithvi foundation model -> also see the Baseline Model for Segmentation

prithvi-pytorch -> makes Prithvi usable from Pytorch Lightning

geo-bench -> a General Earth Observation benchmark for evaluating the performances of large pre-trained models on geospatial data

USat -> A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery

hydro-foundation-model -> A Foundation Model for Water in Satellite Imagery

RSBuilding -> Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model

Text2Seg -> a pipeline that combined multiple Vision Foundation Models (SAM, CLIP, GroundingDINO) to perform semantic segmentation.

Remote-Sensing-RVSA -> Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model

FoMo-Bench -> a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models

MTP -> Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

DiffusionSat -> A Generative Foundation Model For Satellite Imagery

granite-geospatial-biomass -> A geospatial model for Above Ground Biomass from IBM

⭐️ Star History

Star History Chart

  • Logo created with Brandmark

Releases 22

Sponsor this project, contributors 22.

@robmarkcole

Downloadable Content

satellite image processing thesis titles

Digital image processing

  • Masters Thesis
  • Ha, Vinh Thuc
  • Wong, Robert
  • Bavarian, Behzad
  • Electrical and Computer Engineering
  • California State University, Northridge
  • Dissertations, Academic -- CSUN -- Engineering.
  • 2017-04-11T18:16:09Z
  • http://hdl.handle.net/10211.3/189576
  • by Vinh Thuc Ha
  • California State University, Northridge. Department of Engineering.
  • Includes bibliographical references (page 59)

California State University, Northridge

Thumbnail Title Date Uploaded Visibility Actions
2021-01-16 Public

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Satellite Image Processing - Science topic

Polina Lemenkova

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

Title: a systematic review of the use of deep learning in satellite imagery for agriculture.

Abstract: Agricultural research is essential for increasing food production to meet the requirements of an increasing population in the coming decades. Recently, satellite technology has been improving rapidly and deep learning has seen much success in generic computer vision tasks and many application areas which presents an important opportunity to improve analysis of agricultural land. Here we present a systematic review of 150 studies to find the current uses of deep learning on satellite imagery for agricultural research. Although we identify 5 categories of agricultural monitoring tasks, the majority of the research interest is in crop segmentation and yield prediction. We found that, when used, modern deep learning methods consistently outperformed traditional machine learning across most tasks; the only exception was that Long Short-Term Memory (LSTM) Recurrent Neural Networks did not consistently outperform Random Forests (RF) for yield prediction. The reviewed studies have largely adopted methodologies from generic computer vision, except for one major omission: benchmark datasets are not utilised to evaluate models across studies, making it difficult to compare results. Additionally, some studies have specifically utilised the extra spectral resolution available in satellite imagery, but other divergent properties of satellite images - such as the hugely different scales of spatial patterns - are not being taken advantage of in the reviewed studies.
Comments: 23 pages, 5 figures and 10 tables in main paper. Supplementary materials section also included in main pdf. Update: All tables with specific references have been moved to supplementary. Main text now uses only aggregated information
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Cite as: [cs.CV]
  (or [cs.CV] for this version)
  Focus to learn more arXiv-issued DOI via DataCite

Submission history

Access paper:.

  • HTML (experimental)
  • Other Formats

license icon

References & Citations

  • Google Scholar
  • Semantic Scholar

BibTeX formatted citation

BibSonomy logo

Bibliographic and Citation Tools

Code, data and media associated with this article, recommenders and search tools.

  • Institution

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .

  • DSpace@MIT Home
  • MIT Libraries
  • Doctoral Theses

Learning efficient image processing pipelines

Thumbnail

Other Contributors

Terms of use, description, date issued, collections.

Show Statistical Information

  • Bibliography
  • More Referencing guides Blog Automated transliteration Relevant bibliographies by topics
  • Automated transliteration
  • Relevant bibliographies by topics
  • Referencing guides

Form Submitted

Www.matlabsimulation.com, satellite image processing projects, related pages.

Matlab Image Processing Projects

Medical Image Processing Projects

Digital Image Processing Projects

Biomedical Image Processing Projects

Microscopic Image Processing Projects

Image Processing Projects Using Matlab

Matlab Based Image Processing Projects

Digital Image Processing Matlab Projects

Digital Image Processing Projects Matlab

Color Conversion Image Processing Projects

Digital Image Processing Projects Matlab Pdf

Currency Recognition Image Processing Projects

Digital Image Processing Projects Using Matlab

Digital Image Processing Using Matlab Projects

Matlab Digital Image Processing Projects Ideas

Research Areas

Digital Image processing

Medical Image processing

Biomedical Signal processing

Biometric Detection

Information Security

Neural Networks

Network Security

Wireless Networks

Wireless Communication

Matlab Simulink

power Electronics

Renewable Energy

Mobile Communication

Satellite Communication

Optical Communication

Video processing

Artificial Intelligence

Advanced Robotics

Related Tools

3D-2D CONVERTER

RAPID MINER

SENTI WORDNET

CLOUD REPORTS

Satellite Image Processing Projects is a classic stand that serves you with an eclectic idea to build up your study skill. Satellite image processing is a grand topic where you can inquiry and progress over the satellite image. In short, it is a crucial study field in image processing where satellite images are inspected. Indeed, it draws a lot of scholars due to its wide range of applications in the real world. Here, we give you some skills and topics to know the efficacy of satellite image processing projects .

Proficiency Over­­ Satellite Imaging

  • High spatial resolution and also Hefty coverage
  • Terrain and also ocean imaging
  • Multispectral and multi temporal
  • Evades weather strife
  • Clear and also precise images
  • Easy to georectify
  • And also many more

Topics Under Satellite Image Processing Projects

  • Forest carbon flux estimation in Landsat image
  • Relative radiometric correction in CCD image
  • Image Fusion for object oriented mapping
  • Land and also vegetation cover estimation
  • Land classification in multi and also hyper spectral image
  • Content based registration

Satellite Image Processing Projects Using Matlab Simulink

   As like said topics, there be a lot of in satellite-based projects. Right now, plant disease detection is a key study field since food plays a vast part under the human day to day life. Besides, satellite images are also taking on in plant disease finding. Here, images from the satellite are dealing out under image processing steps to detect the disease in it. Let’s have a fast tour on plant disease detected using satellite image,

Plant Diseases Detected Using Satellite Image

  • Coffee berry necrosis
  • Texas root rot
  • Soilborne and also seedborne
  • Citrus greening
  • Olive tree stress
  • And also wheat yellow rust

   In satellite imaging, datasets are key cut up to test the images. In general, pick up Satellite Image Processing Projects  is not at ease task. Yet, through our aid, you can get great results in your study. Here, we list a few datasets to your knowledge.

Enormous Datasets

  • Vegetation Whole Globe Products
  • And also LANDSAT

Welfares under Satellite Imaging

  • Mineral Deposits Extraction
  • Disaster mitigating plan and also Recovering
  • Volcano and also Hazard Monitoring
  • Agriculture and also biodiversity conversation
  • Forestry and also fisheries harvesting
  • Seismic risk assessment

   In fact, we do not ever watch the clock; instead, we do what it does. Hence, you can get in touch with us any time online/offline. Further, we have more than 15+ years of experience in research. Hence, you can get the sole idea in your dream domain. To sum up, we rate your vision in research; hence we show our aptness by standard and quality.

A life is full of expensive thing ‘TRUST’ Our Promises

100% confidential.

We don’t disclose our client’s data to any third party and it is kept as secret to all.

Fresh Ideas

All the time, you receive only the novel ideas that not release from anywhere.

Work Guarantee

We guarantee for your satisfaction and it is not further need of correction and complaints.

Plagiarism Free

We follow our own writing. It means that without copying from any source, we write it.

We are not intent to sale our product to more than one person. It is totally new for each.

Business Ethics

Surely, we follow our set of ethics to send the high quality products for everyone.

On-Time Delivery

We always deliver your work at the time of delivery. So that feels our timely guidance.

Affordable Price

We are not serving with high cost. You get an excellent work in at reasonable price.

We meet each individual customer’s demands by our years of skills. This is the secret of success.

Great Memories Our Achievements

We received great winning awards for our research awesomeness and it is the mark of our success stories. It shows our key strength and improvements in all research directions.

satellite image processing thesis titles

Our Guidance

  • Assignments
  • Literature Survey
  • Mathematical Proofs
  • Research Proposal
  • System Development
  • Paper Writing
  • Conference Paper
  • Thesis Writing
  • Dissertation Writing
  • Hardware Integration
  • Paper Publication

24/7 Support, Call Us @ Any Time [email protected] +91 94448 56435

Naval Postgraduate School

  • NPS Dudley Knox Library
  • Research Guides

Satellite Communications

  • Articles & Reports
  • More Resources
  • Satellite Jamming

Space Systems Operations Thesis Research Guide

  • Space Systems Operations (SSO) Thesis Research Guide This guide has been prepared for SSO students to assist them in the research and writing of their master’s thesis. It outlines the specific steps necessary to successfully complete a thesis of high quality. It has been written specifically for you, the student. See the SSO Thesis Guide link under the Thesis Documents and Topics section.

Satellite Communication Theses

NPS Archive: Calhoun

NPS Theses & Dissertations

You will find links to both archives (Public and Restricted).

  • NPS Theses Go to this page to access both the NPS Archive: Calhoun (publicly releasable) and the Restricted Collection (Controlled Unclassified Information)

NPS Theses & Dissertations (publicly releasable)

  • NPS Archive: Calhoun

Find publicly releasable NPS-authored scholarly works (Theses, Dissertations, Capstone Project Reports, MBA Professional Reports, Joint Applied Project Reports, and other NPS degree-earning written works)

NPS Outstanding Theses (publicly releasable)

  • NPS Outstanding Theses

Outstanding Thesis recognitions are given to students who have exhibited exceptional quality in their thesis research work.

  • Publicly releasable only
  • This list is school-wide, but you can filter by Department using the dropdown menus

NPS Theses & Dissertations (restricted)

restricted lock

NPS-authored scholarly works (NPS Theses, Dissertations, Capstone Project Reports, MBA Professional Reports, Joint Applied Project Reports, and other degree-earning works)

  Who can access?

To request an account, you must be both of these:

  • Current  NPS member 
  • Federal employee/contractor or U.S. military

If asked to login to Microsoft, use your NPS email address.

Access available to U.S. citizens only

Already have an account?

Search the Restricted Collection

Are you a new user?

Request an Account

  • Restricted Resources Restricted Resources is the official archive for restricted NPS Theses, Dissertations, Capstone Project Reports, and other documents.

Non-NPS Theses, Dissertations & Reports

  • Defense Technical Information Center (DTIC) (publicly releasable) Publicly releasable documents only in all areas of defense research (science, technology, engineering, social science, business & budget), DOD directives & instructions. DTIC is the central scientific, research, and engineering information support activity for the DOD. more... less... Each product listed on the page must be searched separately. This archive has limited search filtering capabilities. For a better search experience, you can use the National Technical Reports Library or if you are eligible, register and use the DTIC R&E Gateway and use the Advanced Search to filter for "A - Approved for Public Release" documents.

Password or registration required

  • E-Theses Online Service (EThOS) UK’s national thesis service for doctoral research theses
  • National Technical Reports Library (National Technical Information Service) Scientific, technical, engineering & related business information by or for the U.S. government & complementary materials from international sources. Includes publicly accessible DTIC reports in their Technical Reports Library. NTIS has a more fully functioning search interface.
  • Networked Digital Library of Theses and Dissertations (NDLTD) more... less... Website
  • OAIster Multidisciplinary, academically-oriented digital resources including theses and dissertations
  • Open Access Theses and Dissertations (OATD) Open access graduate theses and dissertations published around the world
  • PQDT Open (ProQuest) Open access dissertations and theses

NPS-Licensed Resource

Chat widget code 60 sec

  • << Previous: Books
  • Next: More Resources >>
  • Last Updated: Jul 22, 2024 11:48 AM
  • URL: https://libguides.nps.edu/satcom

satellite image processing thesis titles

411 Dyer Rd. Bldg. 339 Monterey, CA 93943

(831) 656-2947
DSN 756-2947

Start Your Research

  • Academic Writing
  • Ask a Librarian
  • Copyright at NPS
  • Graduate Writing Center
  • How to Cite
  • Library Liaisons
  • Research Tools
  • Thesis Processing Office

Find & Download

  • Databases List
  • Articles, Books, & More
  • NPS Faculty Publications: Calhoun
  • Journal Titles
  • Course Reserves

Use the Library

  • My Accounts
  • Request Article or Book
  • Borrow, Renew, Return
  • Remote Access
  • Workshops & Tours
  • For Faculty & Researchers
  • For International Students
  • Print, Copy, Scan, Fax
  • Rooms & Study Spaces
  • Computers & Software
  • Adapters, Lockers & More

Collections

  • Restricted Resources
  • Special Collections & Archives
  • Federal Depository
  • Homeland Security Digital Library
  • Library Staff
  • Special Exhibits
  • Our Affiliates

NPS-Licensed Resources - Terms & Conditions

Copyright Notice

Federal Depository Library

Naval Postgraduate School 1 University Circle, Monterey, CA 93943 Driving Directions | Campus Map

This is an official U.S. Navy Website |  Please read our Privacy Policy Notice  |  FOIA  |  Section 508  |  No FEAR Act  |  Whistleblower Protection  |  Copyright and Accessibility  |  Contact Webmaster

M.Tech/Ph.D Thesis Help in Chandigarh | Thesis Guidance in Chandigarh

satellite image processing thesis titles

[email protected]

satellite image processing thesis titles

+91-9465330425

What is Digital Image Processing?

Digital image processing is the process of using computer algorithms to perform image processing on digital images. Latest topics in digital image processing for research and thesis are based on these algorithms. Being a subcategory of digital signal processing, digital image processing is better and carries many advantages over analog image processing. It permits to apply multiple algorithms to the input data and does not cause the problems such as the build-up of noise and signal distortion while processing. As images are defined over two or more dimensions that make digital image processing “a model of multidimensional systems”. The history of digital image processing dates back to early 1920s when the first application of digital image processing came into news. Many students are going for this field for their  m tech thesis  as well as for Ph.D. thesis. There are various thesis topics in digital image processing for M.Tech, M.Phil and Ph.D. students. The list of thesis topics in image processing is listed here. Before going into  topics in image processing , you should have some basic knowledge of image processing.

image-processing

Latest research topics in image processing for research scholars:

  • The hybrid classification scheme for plant disease detection in image processing
  • The edge detection scheme in image processing using ant and bee colony optimization
  • To improve PNLM filtering scheme to denoise MRI images
  • The classification method for the brain tumor detection
  • The CNN approach for the lung cancer detection in image processing
  • The neural network method for the diabetic retinopathy detection
  • The copy-move forgery detection approach using textual feature extraction method
  • Design face spoof detection method based on eigen feature extraction and classification
  • The classification and segmentation method for the number plate detection
  • Find the link at the end to download the latest thesis and research topics in Digital Image Processing

Formation of Digital Images

Firstly, the image is captured by a camera using sunlight as the source of energy. For the acquisition of the image, a sensor array is used. These sensors sense the amount of light reflected by the object when light falls on that object. A continuous voltage signal is generated when the data is being sensed. The data collected is converted into a digital format to create digital images. For this process, sampling and quantization methods are applied. This will create a 2-dimensional array of numbers which will be a digital image.

Why is Image Processing Required?

  • Image Processing serves the following main purpose:
  • Visualization of the hidden objects in the image.
  • Enhancement of the image through sharpening and restoration.
  • Seek valuable information from the images.
  • Measuring different patterns of objects in the image.
  • Distinguishing different objects in the image.

Applications of Digital Image Processing

  • There are various applications of digital image processing which can also be a good topic for the thesis in image processing. Following are the main applications of image processing:
  • Image Processing is used to enhance the image quality through techniques like image sharpening and restoration. The images can be altered to achieve the desired results.
  • Digital Image Processing finds its application in the medical field for gamma-ray imaging, PET Scan, X-ray imaging, UV imaging.
  • It is used for transmission and encoding.
  • It is used in color processing in which processing of colored images is done using different color spaces.
  • Image Processing finds its application in machine learning for pattern recognition.

List of topics in image processing for thesis and research

  • There are various in digital image processing for thesis and research. Here is the list of latest thesis and research topics in digital image processing:
  • Image Acquisition
  • Image Enhancement
  • Image Restoration
  • Color Image Processing
  • Wavelets and Multi Resolution Processing
  • Compression
  • Morphological Processing
  • Segmentation
  • Representation and Description
  • Object recognition
  • Knowledge Base

1. Image Acquisition:

Image Acquisition is the first and important step of the digital image of processing . Its style is very simple just like being given an image which is already in digital form and it involves preprocessing such as scaling etc. It starts with the capturing of an image by the sensor (such as a monochrome or color TV camera) and digitized. In case, the output of the camera or sensor is not in digital form then an analog-to-digital converter (ADC) digitizes it. If the image is not properly acquired, then you will not be able to achieve tasks that you want to. Customized hardware is used for advanced image acquisition techniques and methods. 3D image acquisition is one such advanced method image acquisition method. Students can go for this method for their master’s thesis and research.

2. Image Enhancement:

Image enhancement is one of the easiest and the most important areas of digital image processing. The core idea behind image enhancement is to find out information that is obscured or to highlight specific features according to the requirements of an image. Such as changing brightness & contrast etc. Basically, it involves manipulation of an image to get the desired image than original for specific applications. Many algorithms have been designed for the purpose of image enhancement in image processing to change an image’s contrast, brightness, and various other such things. Image Enhancement aims to change the human perception of the images. Image Enhancement techniques are of two types: Spatial domain and Frequency domain.

3. Image Restoration:

Image restoration involves improving the appearance of an image. In comparison to image enhancement which is subjective, image restoration is completely objective which makes the sense that restoration techniques are based on probabilistic or mathematical models of image degradation. Image restoration removes any form of a blur, noise from images to produce a clean and original image. It can be a good choice for the M.Tech thesis on image processing. The image information lost during blurring is restored through a reversal process. This process is different from the image enhancement method. Deconvolution technique is used and is performed in the frequency domain. The main defects that degrade an image are restored here.

4. Color Image Processing:

Color image processing has been proved to be of great interest because of the significant increase in the use of digital images on the Internet. It includes color modeling and processing in a digital domain etc. There are various color models which are used to specify a color using a 3D coordinate system. These models are RGB Model, CMY Model, HSI Model, YIQ Model. The color image processing is done as humans can perceive thousands of colors. There are two areas of color image processing full-color processing and pseudo color processing. In full-color processing, the image is processed in full colors while in pseudo color processing the grayscale images are converted to colored images. It is an interesting topic in image processing.

satellite image processing thesis titles

THESIS ON IMAGE PROCESSING

Most of the research scholars to choose their thesis work under the domain of image processing. Operates on images and results in images which improve the visibility of features and to facilitate subsequent analysis. To enhance raw images received from sources such as cameras and sensors. Image processing thesis must depend on the implementation and paper preparation.

Image Processing operations:

  • Enhancement of important image particulars by the way suppressing other information’s.
  • Image imperfections and defects are corrected.
  • Create binary representation.
  • Done performance measurements of features.

Applications of Image Processing:

  • Forensic Studies.
  • Military Applications.
  • Document Processing.
  • Non-Destructive Evaluation.
  • Graphic Arts.

Thesis Topics for Image Processing:

  • Content based Image Retrieval
  • Intelligent Transportation System
  • Satellite Imaging
  • Forensic Image Processing

Content based Image Retreival:- Thesis Topics for Image Processing

  • Search analyzes contents according to the image not in the meta data such as keywords, tags or description associated with the image
  • Contents are may be in the form of shape, color and texture

Recent CBIR issues:-Thesis Topics for Image Processing

  • Image Blocking
  • Color and Texture Feature Derivation
  • Color Indexing

Security:-Thesis Topics for Image Processing

  • Security to be maintained by the techniques of cryptography, steganography and watermarking

Intelligent Transportation System:-Thesis Topics for Image Processing

  • Video retrieval process to be done onto the intelligent transportation system

Satellite Imaging:-Thesis Topics for Image Processing

  • Geographic Information System based concepts are developed under matlab simulation
  • Research problems taken into remote sensing input images such as SAR, RADAR, SONAR, ASTER

Forensic Image Processing:-Thesis Topics for Image Processing

  • Involves the computer restoration and enhancement of surveillance imagery
  • To maximize information extraction from surveillance imagery

Methods of Image Processing:-Thesis Topics for Image Processing

  • Adaptive Transformations
  • Watermarking
  • Sparse Representation
  • Dictionary learning

Adaptive Transformation:-Thesis Topics for Image Processing

  • Adaptive transformation method works based on classical methods and adaptive methods
  • Adaptive method constituents are stepping rule, stopping criterion and threshold estimate
  • Performances are measured by bias, precision and efficiency

Watermarking:-Thesis Topics for Image Processing

  • Embedding data into an image
  • Data must be stored in the actual pixels
  • Format may change

Sparse Representation:-Thesis Topics for Image Processing

  • Sparse signals can be exactly reconstructed from a small number of elementary signals
  • Collaborative sparse coding defined by a single point, it have more than one sparse representation with similar data fitting errors
  • Sparse representations are classified into linear representation and non-linear representation

Dictionary Learning:-Thesis Topics for Image Processing

  • Dictionary learning is to be solved by a class dependent supervised simultaneous orthogonal matching
  • NP-hard but greedy optimizations are possible
  • Compute redundant dictionary size which minimizes average non-linear approximation error

MATLAB THESIS PHD  Sample Code for Curve Fitting:-

x = [1 2 3 4 5 6]; y = [5.5 43.1 128 290.7 498.4 978.67]; %data p = polyfit(x,y,4) %get the polynomial % Compute the values of the polyfit estimate over a finer range, % and plot the estimate over the real data values for comparison: x2 = 1:.1:6; y2 = polyval(p,x2); plot(x,y,’o’,x2,y2) grid on

MATLAB THESIS PHD  Sample Code for Image Retrieval using bag of features:-

Select image features for retrieval: % Display a few of the flower images helperDisplayImageMontage(flowerImageSet.ImageLocation(1:50:1000)); Create bag of Features: % Load pre-trained bagOfFeatures load(‘savedColorBagOfFeatures.mat’,’colorBag’); Index Images: % Create a search index % flowerImageIndex = indexImages(flowerImageSet, colorBag, ‘SaveFeatureLocations’, false); % Load the pre-saved index load(‘savedColorBagOfFeatures.mat’, ‘flowerImageIndex’);

Search similar images: % Define a query image queryImage = read(flowerImageSet, 502);

figure imshow(queryImage)

% Search for the top 20 images with similar color content[imageIDs, scores] = retrieveImages(queryImage, flowerImageIndex);

Research Topics

Biomedical Imaging

Biomedical Imaging

The current plethora of imaging technologies such as magnetic resonance imaging (MR), computed tomography (CT), position emission tomography (PET), optical coherence tomography (OCT), and ultrasound provide great insight into the different anatomical and functional processes of the human body.

Computer Vision

Computer Vision

Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.

Image Segmentation/Classification

Image Segmentation/Classification

Extracting information from a digital image often depends on first identifying desired objects or breaking down the image into homogenous regions (a process called 'segmentation') and then assigning these objects to particular classes (a process called 'classification'). This is a fundamental part of computer vision, combining image processing and pattern recognition techniques.

Multiresolution Techniques

Multiresolution   Techniques

The VIP lab has a particularly extensive history with multiresolution methods, and a significant number of research students have explored this theme. Multiresolution methods are very broad, essentially meaning than an image or video is modeled, represented, or features extracted on more than one scale, somehow allowing both local and non-local phenomena.

Remote Sensing

Remote Sensing

Remote sensing, or the science of capturing data of the earth from airplanes or satellites, enables regular monitoring of land, ocean, and atmosphere expanses, representing data that cannot be captured using any other means. A vast amount of information is generated by remote sensing platforms and there is an obvious need to analyze the data accurately and efficiently.

Scientific Imaging

Scientific Imaging

Scientific Imaging refers to working on two- or three-dimensional imagery taken for a scientific purpose, in most cases acquired either through a microscope or remotely-sensed images taken at a distance.

Stochastic Models

Stochastic Models

In many image processing, computer vision, and pattern recognition applications, there is often a large degree of uncertainty associated with factors such as the appearance of the underlying scene within the acquired data, the location and trajectory of the object of interest, the physical appearance (e.g., size, shape, color, etc.) of the objects being detected, etc.

Video Analysis

Video Analysis

Video analysis is a field within  computer vision  that involves the automatic interpretation of digital video using computer algorithms. Although humans are readily able to interpret digital video, developing algorithms for the computer to perform the same task has been highly evasive and is now an active research field.

Deep Evolution Figure

Evolutionary Deep Intelligence

Deep learning has shown considerable promise in recent years, producing tremendous results and significantly improving the accuracy of a variety of challenging problems when compared to other machine learning methods.

Discovered Radiomics Sequencer

Discovery Radiomics

Radiomics, which involves the high-throughput extraction and analysis of a large amount of quantitative features from medical imaging data to characterize tumor phenotype in a quantitative manner, is ushering in a new era of imaging-driven quantitative personalized cancer decision support and management. 

Discovered Radiomics Sequencer

Sports Analytics

Sports Analytics is a growing field in computer vision that analyzes visual cues from images to provide statistical data on players, teams, and games. Want to know how a player's technique improves the quality of the team? Can a team, based on their defensive position, increase their chances to the finals? These are a few out of a plethora of questions that are answered in sports analytics.

Share via Facebook

  • Contact Waterloo
  • Maps & Directions
  • Accessibility

The University of Waterloo acknowledges that much of our work takes place on the traditional territory of the Neutral, Anishinaabeg, and Haudenosaunee peoples. Our main campus is situated on the Haldimand Tract, the land granted to the Six Nations that includes six miles on each side of the Grand River. Our active work toward reconciliation takes place across our campuses through research, learning, teaching, and community building, and is co-ordinated within the Office of Indigenous Relations .

USC Viterbi School of Engineering Logo – Viterbi School website

Leveling Up Research and Publishing in Geoscience’s Open-Source Era 

A new workshop held at usc’s information sciences institute hopes to have a ripple effect throughout the geoscience community..

researchers at a workshop

Photo Credit: ISI/USC

Ancient rocks from the Southern Ocean; satellite images of the Earth’s atmosphere; soil samples from the Mojave Desert trenches. The 18 geoscientists who gathered for the first FROGS (Facilitating Reproducible Open GeoScience) workshop at USC Viterbi’s Information Sciences Institute (ISI) from June 3-6, 2024 may study a wide and diverse range of data sources, but they came with a common goal: to learn how to level up their research, data sharing, and publishing techniques.

Participants like Jhon Mojica, a senior researcher at the University of Miami working with NOAA (National Oceanic and Atmospheric Administration), got an introduction to Scientific Python and R programming languages. Mojica said, “I’m leading projects on the expansion of Port Everglades and water quality around South Florida. Learning to use Python to automate data processing will make our studies more robust and efficient.”

They were taught methods such as spectral analysis to interpret environmental variability over different timescales. Pranaykumar Tirpude, a Ph.D. student at the University of Delaware whose research involves studying 1.4 million years of data from the Southern Ocean said, “Implementing these techniques will help me better understand climate cycles and the stability of ice sheets over geological time.”

Setting sail with PyRATES

The workshop was hosted by LinkedEarth , an initiative that brings together AI and paleoclimate research to create a cohesive understanding of historical climate data by revolutionizing the way data is managed and analyzed. 

This particular curriculum, dubbed PyRATES (Python and R Analysis of Time SerieS), catered to researchers with little to no experience in computer programming languages (i.e., Python and R) and was led by Deborah Khider , paleoclimatologist and Research Scientist at ISI; Julien Emile-Geay , Professor of Earth Sciences at USC Dornsife College of Letters, Arts & Sciences ; and from Northern Arizona University’s School of Earth and Sustainability, Associate Professor Nick McKay and Data Scientist David Edge .

Khider, Emile-Geay, McKay and Edge set out with a lofty goal – “to elevate participants’ research to the next level by equipping them with advanced techniques that were previously beyond their reach, and helping them apply these methods to a broader array of datasets than previously thought possible.”  Khider explained, “With PyRATES, we wanted participants to come out with the skills to do their science using open source libraries while sharing their own project openly and in a reproducible manner.”

Embracing open science with FAIR publishing

In addition to the advanced research methods, FROGS participants were taught best practices for FAIR science publishing – the principles of making research Findable, Accessible, Interoperable and Reusable. Through hands-on sessions, they learned techniques for data versioning, managing metadata, using open data repositories, and applying appropriate licensing. These skills enhance reproducibility and collaboration, ensuring data and findings are easily shared, accessed, and usable by others in the scientific community, meeting the evolving standards of journals and funding agencies.

Kathryn Chen, a biological oceanographer at the Scripps Institution of Oceanography, highlighted the workshop’s emphasis on this, “I am currently drafting my first paper, so I particularly appreciated the FAIR publishing aspect. Learning about versioning datasets, code, and workflows has been instrumental for my research.” 

In the scientists’ own words…

The workshop drew participants from various fields within geosciences, each bringing unique perspectives and gaining invaluable insights.

Among them was Dannielle Fougere, a fifth-year Ph.D. student in the Earth Sciences department at USC. Fougere, a paleoseismologist, is focused on understanding the behavior of the Garlock Fault in the Mojave Desert. Her research involves calculating slip rates to determine how fast the fault has been moving over time, which is crucial for understanding seismic activity in the region. Fougere explained, “ Paleoseismology is not very quantitative. With techniques learned here, I’d like to add a component in my thesis that’s a bit more quantitative to solidify things for editors and reviewers.”

Victor Olawoyin, a Ph.D. student in earthquake seismology at Boston College, found the workshop particularly beneficial for learning time series analysis and improving the reproducibility of his research. “The time series analysis is crucial for interpreting seismograms. The publishing part was also really cool, as it helps in releasing data and software with better workflow,” Olawoyin said. He plans to apply these new skills directly to his Ph.D. thesis and future research projects.

Venkataramana Sridhar, a faculty member from Virginia Tech specializing in hydrology, climate change, and water resources, saw the workshop as a way to enhance both his research and teaching. “This program covered a wide range of topics from publishing to data analytics, all crucial for my research on how climate change impacts hydrology and water resources. The skills and insights gained here will be invaluable in both my classroom teachings and research endeavors,” he stated.

Sreedevi Puthiyamadam Vasu, a Ph.D. student in Atmospheric Science at the Florida Institute of Technology, found the training perfectly tailored to her research needs. “My work on seasonal and sub-seasonal prediction of precipitation relies heavily on time series analysis. The focus on Python and R was exactly what I needed to transition from proprietary software like MATLAB to open-source tools, enhancing my ability to contribute to open science.”

Building a collaborative future

The workshop not only equipped participants with new technical skills but also fostered a sense of community and collaboration. The interdisciplinary nature of the event brought fresh perspectives to geoscientific challenges, inspiring innovative approaches and potential collaborations.

As these researchers return to their respective sub-fields equipped with new capabilities in open-source programming and FAIR principles, the hope is that the effects of this workshop will influence the broader geoscience community. The commitment to open science and reproducibility promises to drive forward the quality and impact of future geoscientific research.

The LinkedEarth team plans to continue supporting the geoscience community with similar training opportunities, fostering an ecosystem where scientific discoveries are not only made but shared openly and efficiently.

Published on July 15th, 2024

Last updated on July 15th, 2024

Share this Story

Related Stories

Digital transformation concept. System engineering. Binary code. Programming.

ABOUT THE SCHOOL

  • 115 Year Celebration
  • About Andrew Viterbi
  • Diversity Equity & Inclusion
  • Facts and Numbers
  • Faculty Directory
  • Ginsburg Hall
  • USC Michelson Center

FROM THE DEAN

  • Dean's Message
  • Dean's Report
  • Initiatives and Priorities
  • Engineering +
  • Strategic Plan

NEWS | MEDIA | EVENTS

  • Keynote Lecture Series
  • Media Contact & Press Releases
  • Media Coverage
  • Public Image Archive
  • Publications
  • Social Media
  • Viterbi News Now

SCHOOL OF ADVANCED COMPUTING

  • Thomas Lord Department of Computer Science
  • Ming Hsieh Department of Electrical and Computer Engineering
  • Division of Computing Education (DCE)
  • Information Technology Program (ITP)
  • Interdisciplinary Data Science (IDS)
  • Information Science Institute (ISI)
  • Institute for Creative Technologies (ICT)
  • More to come soon

DEPARTMENTS AND ACADEMIC PROGRAMS

  • Aerospace and Mechanical Engineering
  • Astronautical Engineering
  • Alfred E. Mann Department of Biomedical Engineering
  • Mork Family Department of Chemical Engineering and Materials Science
  • Sonny Astani Department of Civil and Environmental Engineering
  • Daniel J. Epstein Department of Industrial and Systems Engineering
  • Engineering in Society Program
  • Information Technology Program

EXECUTIVE AND CONTINUING EDUCATION

  • Aviation Safety and Security Program
  • Corporate and Professional Programs

ONLINE ACCESS

  • Graduate Programs - DEN@Viterbi

SPECIALIZED GRADUATE PROGRAMS

  • Financial Engineering Program
  • Green Technologies Program
  • Data Science Program
  • Progressive Degree Program
  • Systems Architecting and Engineering Program

RESOURCES AND INITIATIVES

  • Academic Integrity
  • Accreditation
  • Awards Office
  • John Brooks Slaughter Center for Engineering Diversity
  • Division of Engineering Education
  • Globalization
  • K-12 Outreach
  • USC Experts Directory
  • Women in Science and Engineering

FIRST YEAR APPLICANTS

satellite image processing thesis titles

MASTER'S APPLICANTS

satellite image processing thesis titles

PHD APPLICANTS

satellite image processing thesis titles

TRANSFER APPLICANTS

satellite image processing thesis titles

RESEARCH ENVIRONMENT

  • Search Faculty Research Areas
  • Departments, Research Institutes and Centers
  • Research Infrastructure
  • Research Initiatives
  • Research Vision
  • Student Research
  • Summer Undergraduate Research Experience

TECHNOLOGY INNOVATION AND ENTREPRENEURSHIP

  • NSF I-Corps Hub: West Region
  • Office of Technology Innovation and Entrepreneurship
  • USC Stevens Center for Innovation
  • Viterbi News Network
  • Diversity Equity Inclusion
  • Dean’s Message
  • Dean’s Report
  • Media Contact & Press Releases
  • More to Come Soon
  • Biomedical Engineering
  • Informatics Program
  • Graduate Programs – DEN@Viterbi
  • First Year Applicants
  • Master’s Applicants
  • PHD Applicants
  • Transfer Applicants
  • Competitions
  • Entrepreneurship
  • I-Corps Node
  • Viterbi Startup Garage
  • Viterbi Student Innovation Institute (VSI2)
  • Viterbi Venture Fund

IMAGES

  1. Satellite image Processing Seminar Report

    satellite image processing thesis titles

  2. (PDF) Digital Image Processing of Satellite image

    satellite image processing thesis titles

  3. (PDF) Satellite Image Processing with MATLAB

    satellite image processing thesis titles

  4. (PDF) Optimizing Satellite Monitoring of Volcanic Areas Through GPUs

    satellite image processing thesis titles

  5. (PDF) Satellite Image Processing Workflow for RASAT and Göktürk-2

    satellite image processing thesis titles

  6. (PDF) Remote Sensing Satellite Image Processing Techniques for Image

    satellite image processing thesis titles

VIDEO

  1. Satellite Image Processing Matlab Projects

  2. Satellite Image Segmentation using Watershed Segmentation

  3. Satellite Image Processing Projects

  4. Thesis on Image Processing

  5. Dsp Image Processing Projects

  6. Sub-setting study area from satellite image using ERDAS Imagine 2014

COMMENTS

  1. Theses

    A list of completed theses and new thesis topics from the Computer Vision Group. ... Deconvolution is an important image processing step in improving the quality of microscopy images for removing out-of-focus light, higher resolution, and beter signal to noise ratio. ... Satellite images have many applications, e.g. in meteorology, geography ...

  2. PDF Satellite Image Processing with Biologically-Inspired Computational

    Satellite Image Processing with ... Methods and Visual Attention Md Ibne Sina Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of ... Dr. Alberto Chávez-Aragón for all those interesting discussions on diverse types of topics. I owe my deepest gratitude to my lovely ...

  3. Frontiers

    The good results—as high as AP = 0.937 and AR = 0.959 —from these models demonstrate the feasibility of Deep Learning in automated satellite image annotation. 1. Introduction. Despite substantial advances in global human well-being, the world continues to experience humanitarian crizes and natural disasters.

  4. Image Processing Techniques for Analysis of Satellite Images for ...

    Historical maps classification has become an important application in today's scenario of everchanging land boundaries. Historical map changes include the change in boundaries of cities/states, vegetation regions, water bodies and so forth. Change detection in these regions are mainly carried out via satellite images. Hence, an extensive knowledge on satellite image processing is necessary ...

  5. satellite-image-deep-learning/techniques

    Deep learning has revolutionized the analysis and interpretation of satellite and aerial imagery, addressing unique challenges such as vast image sizes and a wide array of object classes. This repository provides an exhaustive overview of deep learning techniques specifically tailored for satellite and aerial image processing.

  6. Image Processing Techniques for Analysis of Satellite Images for

    Hence, an extensive knowledge on satellite image processing is necessary for historical map classification applications. An exhaustive analysis on the merits and demerits of many satellite image ...

  7. PDF An Adversarial Approach to Sliced Forgery Detection and Localization in

    Satellite images speci cally can be modi ed in a number of ways, such as inserting objects into an image to hide existing scenes and structures. In this thesis, we describe the use of a Conditional Generative Adversarial Network (cGAN) to identify the presence of such spliced forg-eries within satellite images.

  8. PDF IMAGE CLASSIFICATION USING CONVOLUTIONAL NEURAL NETWORKS

    Oulu University of Applied Sciences Information Technology, Internet Services. Author: Hung Dao Title of the bachelor's thesis: Image Classification Using Convolutional Neural Networks Supervisor: Jukka Jauhiainen Term and year of completion: Spring 2020 Number of pages: 31. The objective of this thesis was to study the application of deep ...

  9. Removing Noise from SAR(Satellite) Images

    1 Chapter 1: Introduction 1.1 SAR Image Synthetic Aperture Radar (SAR) is a sort of active remote sensing technology invented in the 1950s by a team of Goodyear Aircraft Company experts.

  10. Digital image processing

    The applications of image processing techniques in the first category were used widely in improving digitized pictures sent by satellite, or captured by X-ray cameras, TV cameras and other digitizers. This graduate project introduces and shows how to apply image processing techniques to process a digitized image on personal computer system.

  11. Satellite image processing

    Satellite image processing plays a vital role for research and developments in Astronomy, Remote Sensing, GIS, Agriculture Monitoring, Disaster Management and many other fields of study. Satellite images are recorded in digital forms and then processed by the computers to extract information. Variations in the scene characteristics are represented as variations in brightness on images. A ...

  12. Dissertations / Theses: 'Satellite Imagery Processing'

    List of dissertations / theses on the topic 'Satellite Imagery Processing'. Scholarly publications with full text pdf download. Related research topic ideas.

  13. 106718 PDFs

    Alexander Lau. This paper presents an image processing-based method for analyzing satellite scans taken from 24 airports and runways different in size and location to detect and understand ...

  14. Title: A systematic review of the use of Deep Learning in Satellite

    A systematic review of the use of Deep Learning in Satellite Imagery for Agriculture. Brandon Victor, Zhen He, Aiden Nibali. Agricultural research is essential for increasing food production to meet the requirements of an increasing population in the coming decades. Recently, satellite technology has been improving rapidly and deep learning has ...

  15. Dissertations / Theses: 'Satellite image analysis'

    List of dissertations / theses on the topic 'Satellite image analysis'. Scholarly publications with full text pdf download. Related research topic ideas.

  16. Learning efficient image processing pipelines

    The high resolution of modern cameras puts significant performance pressure on image processing pipelines. Tuning the parameters of these pipelines for speed is subject to stringent image quality constraints and requires significant efforts from skilled programmers. Because quality is driven by perceptual factors with which most quantitative ...

  17. Dissertations / Theses: 'Satellite Image Processing'

    Relevant bibliographies by topics / Satellite Image Processing / Dissertations / Theses. Dissertations / Theses on the topic 'Satellite Image Processing' To see the other types of publications on this topic, follow the link: Satellite Image Processing. Author: Grafiati.

  18. Satellite Image Processing Projects

    Relative radiometric correction in CCD image. Image Fusion for object oriented mapping. Land and also vegetation cover estimation. Land classification in multi and also hyper spectral image. Content based registration. And also many more. As like said topics, there be a lot of in satellite-based projects.

  19. Theses

    This guide has been prepared for SSO students to assist them in the research and writing of their master's thesis. It outlines the specific steps necessary to successfully complete a thesis of high quality. It has been written specifically for you, the student. See the SSO Thesis Guide link under the Thesis Documents and Topics section.

  20. Latest thesis topics in digital image processing| Research Topics

    Latest research topics in image processing for research scholars: The hybrid classification scheme for plant disease detection in image processing. The edge detection scheme in image processing using ant and bee colony optimization. To improve PNLM filtering scheme to denoise MRI images. The classification method for the brain tumor detection.

  21. Thesis on Image Processing

    To enhance raw images received from sources such as cameras and sensors. Image processing thesis must depend on the implementation and paper preparation. Image Processing operations: Enhancement of important image particulars by the way suppressing other information's. Image imperfections and defects are corrected.

  22. Research Topics

    Computer Vision. Computer vision is the science and technology of teaching a computer to interpret images and video as well as a typical human. Technically, computer vision encompasses the fields of image/video processing, pattern recognition, biological vision, artificial intelligence, augmented reality, mathematical modeling, statistics, probability, optimization, 2D sensors, and photography.

  23. Leveling Up Research and Publishing in Geoscience's Open-Source Era

    Ancient rocks from the Southern Ocean; satellite images of the Earth's atmosphere; soil samples from the Mojave Desert trenches. The 18 geoscientists who gathered for the first FROGS (Facilitating Reproducible Open GeoScience) workshop at USC Viterbi's Information Sciences Institute (ISI) from June 3-6, 2024 may study a wide and diverse range of data sources, but they came with a common ...