Abstract

sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.

Pretrain Effect: Performance of RAW-based visual tasks with and without sRGB pre-trained weights. Two methods: Dirty-Pixel (Siggraph 2021) and RAW-Adapter. Blue line means trained with MS COCO pre-train weights, purple line indicates ImageNet pre-train weights, the yellow line indicates training from scratch. Using sRGB pretrain weights is crucial in RAW-based vision tasks.

Methods

RAW-Adapter is a general framework that could be built on various network backbone, it mainly includes 2 parts: Input-level Adapter and Model-level Adapters

Input-level Adapter : Including (1). dynamic enhancement, (2). denoise and (3). sharpen, as well as (4). white balance and (5).CCM and (6) implicit 3D LUT

Model-level Adapter : Building the connection of above ISP part representation and model level features, dotted line in following pictures.

We adopt dynamic parameter prediction in Input-level Adapter (See (b) in above Figure). Attention block P_K is resposenble to predict the parameters of (1). enhancement (2). denoise and (3). sharpen. Meanwhile attention block P_M is resposenble to predict the parameters of (4). white balance and (5).CCM. Model-level Adapter's merge block can be found in above Figure(c).

The overall structure of RAW-Adapter is parameters-efficient, < 100K for Input-level Adapter, 100K ~ 600K for Model-level Adapter. More details please refer to our paper.

Experimental Results

Detection performance on PASCAL RAW dataset and LOD dataset.

Segmentation performance on ADE20K RAW dataset.

Our Related Research

Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption. AAAI 2024.[Link] [Paper]
You Only Need 90K Parameters to Adapt Light: a Light Weight Transformer for Image Enhancement and Exposure Correction. BMVC 2022.[Link] [Paper]
Multitask AET with Orthogonal Tangent Regularity for Dark Object Detection. ICCV 2021.[Link] [Paper]

BibTeX


		 @inproceedings{raw_adapter,
			  title = {RAW-Adapter: Adapting Pretrained Visual Model to Camera RAW Images},
			  author = {Ziteng Cui and Tatsuya Harada},
			  booktitle={ECCV},
			  year={2024}
			}

This template was originally made by Phillip Isola and Richard Zhang for a colorful ECCV project; the code can be found here.