2022-04-13 16:36:24,659 - mmcls - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.6.9 (default, Dec  8 2021, 21:08:43) [GCC 8.4.0]
CUDA available: False
GCC: x86_64-linux-gnu-gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=0, USE_CUDNN=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.11.1+cpu
OpenCV: 4.5.5
MMCV: 1.4.8
MMCV Compiler: n/a
MMCV CUDA Compiler: n/a
MMClassification: 0.22.0+47f8c48
------------------------------------------------------------

2022-04-13 16:36:24,660 - mmcls - INFO - Distributed training: False
2022-04-13 16:36:26,119 - mmcls - INFO - Config:
model = dict(
    type='ImageClassifier',
    backbone=dict(
        type='VisionTransformer',
        arch='b',
        img_size=224,
        patch_size=16,
        drop_rate=0.1,
        init_cfg=dict(
            type='Pretrained',
            checkpoint=
            'https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth',
            prefix='backbone')),
    neck=None,
    head=dict(
        type='VisionTransformerClsHead',
        num_classes=1000,
        in_channels=768,
        loss=dict(type='CrossEntropyLoss', loss_weight=1.0)))
policy_imagenet = [[{
    'type': 'Posterize',
    'bits': 4,
    'prob': 0.4
}, {
    'type': 'Rotate',
    'angle': 30.0,
    'prob': 0.6
}],
                   [{
                       'type': 'Solarize',
                       'thr': 113.77777777777777,
                       'prob': 0.6
                   }, {
                       'type': 'AutoContrast',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.8
                   }, {
                       'type': 'Equalize',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Posterize',
                       'bits': 5,
                       'prob': 0.6
                   }, {
                       'type': 'Posterize',
                       'bits': 5,
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.4
                   }, {
                       'type': 'Solarize',
                       'thr': 142.22222222222223,
                       'prob': 0.2
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.4
                   }, {
                       'type': 'Rotate',
                       'angle': 26.666666666666668,
                       'prob': 0.8
                   }],
                   [{
                       'type': 'Solarize',
                       'thr': 170.66666666666666,
                       'prob': 0.6
                   }, {
                       'type': 'Equalize',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Posterize',
                       'bits': 6,
                       'prob': 0.8
                   }, {
                       'type': 'Equalize',
                       'prob': 1.0
                   }],
                   [{
                       'type': 'Rotate',
                       'angle': 10.0,
                       'prob': 0.2
                   }, {
                       'type': 'Solarize',
                       'thr': 28.444444444444443,
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.6
                   }, {
                       'type': 'Posterize',
                       'bits': 5,
                       'prob': 0.4
                   }],
                   [{
                       'type': 'Rotate',
                       'angle': 26.666666666666668,
                       'prob': 0.8
                   }, {
                       'type': 'ColorTransform',
                       'magnitude': 0.0,
                       'prob': 0.4
                   }],
                   [{
                       'type': 'Rotate',
                       'angle': 30.0,
                       'prob': 0.4
                   }, {
                       'type': 'Equalize',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.0
                   }, {
                       'type': 'Equalize',
                       'prob': 0.8
                   }],
                   [{
                       'type': 'Invert',
                       'prob': 0.6
                   }, {
                       'type': 'Equalize',
                       'prob': 1.0
                   }],
                   [{
                       'type': 'ColorTransform',
                       'magnitude': 0.4,
                       'prob': 0.6
                   }, {
                       'type': 'Contrast',
                       'magnitude': 0.8,
                       'prob': 1.0
                   }],
                   [{
                       'type': 'Rotate',
                       'angle': 26.666666666666668,
                       'prob': 0.8
                   }, {
                       'type': 'ColorTransform',
                       'magnitude': 0.2,
                       'prob': 1.0
                   }],
                   [{
                       'type': 'ColorTransform',
                       'magnitude': 0.8,
                       'prob': 0.8
                   }, {
                       'type': 'Solarize',
                       'thr': 56.888888888888886,
                       'prob': 0.8
                   }],
                   [{
                       'type': 'Sharpness',
                       'magnitude': 0.7,
                       'prob': 0.4
                   }, {
                       'type': 'Invert',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Shear',
                       'magnitude': 0.16666666666666666,
                       'prob': 0.6,
                       'direction': 'horizontal'
                   }, {
                       'type': 'Equalize',
                       'prob': 1.0
                   }],
                   [{
                       'type': 'ColorTransform',
                       'magnitude': 0.0,
                       'prob': 0.4
                   }, {
                       'type': 'Equalize',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.4
                   }, {
                       'type': 'Solarize',
                       'thr': 142.22222222222223,
                       'prob': 0.2
                   }],
                   [{
                       'type': 'Solarize',
                       'thr': 113.77777777777777,
                       'prob': 0.6
                   }, {
                       'type': 'AutoContrast',
                       'prob': 0.6
                   }],
                   [{
                       'type': 'Invert',
                       'prob': 0.6
                   }, {
                       'type': 'Equalize',
                       'prob': 1.0
                   }],
                   [{
                       'type': 'ColorTransform',
                       'magnitude': 0.4,
                       'prob': 0.6
                   }, {
                       'type': 'Contrast',
                       'magnitude': 0.8,
                       'prob': 1.0
                   }],
                   [{
                       'type': 'Equalize',
                       'prob': 0.8
                   }, {
                       'type': 'Equalize',
                       'prob': 0.6
                   }]]
dataset_type = 'ImageNet'
img_norm_cfg = dict(
    mean=[127.5, 127.5, 127.5], std=[127.5, 127.5, 127.5], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='RandomResizedCrop', size=224, backend='pillow'),
    dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
    dict(
        type='Normalize',
        mean=[127.5, 127.5, 127.5],
        std=[127.5, 127.5, 127.5],
        to_rgb=True),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToTensor', keys=['gt_label']),
    dict(type='ToHalf', keys=['img']),
    dict(type='Collect', keys=['img', 'gt_label'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='Resize', size=(224, -1), backend='pillow'),
    dict(type='CenterCrop', crop_size=224),
    dict(
        type='Normalize',
        mean=[127.5, 127.5, 127.5],
        std=[127.5, 127.5, 127.5],
        to_rgb=True),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='ToHalf', keys=['img']),
    dict(type='Collect', keys=['img'])
]
data = dict(
    samples_per_gpu=17,
    workers_per_gpu=16,
    train=dict(
        type='ImageNet',
        data_prefix='data/imagenet/train',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='RandomResizedCrop', size=224, backend='pillow'),
            dict(type='RandomFlip', flip_prob=0.5, direction='horizontal'),
            dict(
                type='Normalize',
                mean=[127.5, 127.5, 127.5],
                std=[127.5, 127.5, 127.5],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToTensor', keys=['gt_label']),
            dict(type='ToHalf', keys=['img']),
            dict(type='Collect', keys=['img', 'gt_label'])
        ]),
    val=dict(
        type='ImageNet',
        data_prefix='data/imagenet/val',
        ann_file='data/imagenet/meta/val.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', size=(224, -1), backend='pillow'),
            dict(type='CenterCrop', crop_size=224),
            dict(
                type='Normalize',
                mean=[127.5, 127.5, 127.5],
                std=[127.5, 127.5, 127.5],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToHalf', keys=['img']),
            dict(type='Collect', keys=['img'])
        ]),
    test=dict(
        type='ImageNet',
        data_prefix='data/imagenet/val',
        ann_file='data/imagenet/meta/val.txt',
        pipeline=[
            dict(type='LoadImageFromFile'),
            dict(type='Resize', size=(224, -1), backend='pillow'),
            dict(type='CenterCrop', crop_size=224),
            dict(
                type='Normalize',
                mean=[127.5, 127.5, 127.5],
                std=[127.5, 127.5, 127.5],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='ToHalf', keys=['img']),
            dict(type='Collect', keys=['img'])
        ]),
    drop_last=True,
    train_dataloader=dict(mode='async'),
    val_dataloader=dict(samples_per_gpu=4, workers_per_gpu=1),
    test_dataloader=dict(samples_per_gpu=4, workers_per_gpu=1))
evaluation = dict(interval=1, metric='accuracy')
checkpoint_config = dict(interval=1000)
log_config = dict(interval=100, hooks=[dict(type='TextLoggerHook')])
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
paramwise_cfg = dict(
    custom_keys=dict({
        '.cls_token': dict(decay_mult=0.0),
        '.pos_embed': dict(decay_mult=0.0)
    }))
optimizer_config = dict()
optimizer = dict(
    type='SGD',
    lr=0.08,
    weight_decay=1e-05,
    momentum=0.9,
    paramwise_cfg=dict(
        custom_keys=dict({
            '.cls_token': dict(decay_mult=0.0),
            '.pos_embed': dict(decay_mult=0.0)
        })))
lr_config = dict(
    policy='CosineAnnealing',
    min_lr=0,
    warmup='linear',
    warmup_iters=800,
    warmup_ratio=0.02)
ipu_model_cfg = dict(
    train_split_edges=[
        dict(layer_to_call='backbone.patch_embed', ipu_id=0),
        dict(layer_to_call='backbone.layers.3', ipu_id=1),
        dict(layer_to_call='backbone.layers.6', ipu_id=2),
        dict(layer_to_call='backbone.layers.9', ipu_id=3)
    ],
    train_ckpt_nodes=[
        'backbone.layers.0', 'backbone.layers.1', 'backbone.layers.2',
        'backbone.layers.3', 'backbone.layers.4', 'backbone.layers.5',
        'backbone.layers.6', 'backbone.layers.7', 'backbone.layers.8',
        'backbone.layers.9', 'backbone.layers.10', 'backbone.layers.11'
    ])
options_cfg = dict(
    randomSeed=42,
    partialsType='half',
    train_cfg=dict(
        executionStrategy='SameAsIpu',
        Training=dict(gradientAccumulation=32),
        availableMemoryProportion=[0.3, 0.3, 0.3, 0.3]),
    eval_cfg=dict(deviceIterations=1))
runner = dict(
    type='IterBasedRunner',
    ipu_model_cfg=dict(
        train_split_edges=[
            dict(layer_to_call='backbone.patch_embed', ipu_id=0),
            dict(layer_to_call='backbone.layers.3', ipu_id=1),
            dict(layer_to_call='backbone.layers.6', ipu_id=2),
            dict(layer_to_call='backbone.layers.9', ipu_id=3)
        ],
        train_ckpt_nodes=[
            'backbone.layers.0', 'backbone.layers.1', 'backbone.layers.2',
            'backbone.layers.3', 'backbone.layers.4', 'backbone.layers.5',
            'backbone.layers.6', 'backbone.layers.7', 'backbone.layers.8',
            'backbone.layers.9', 'backbone.layers.10', 'backbone.layers.11'
        ]),
    options_cfg=dict(
        randomSeed=42,
        partialsType='half',
        train_cfg=dict(
            executionStrategy='SameAsIpu',
            Training=dict(gradientAccumulation=32),
            availableMemoryProportion=[0.3, 0.3, 0.3, 0.3]),
        eval_cfg=dict(deviceIterations=1)),
    max_iters=5000)
fp16 = dict(loss_scale=256.0, velocity_accum_type='half', accum_type='half')
work_dir = './work_dirs/vit-base-p16_ft-4xb544_in1k-224_ipu'
gpu_ids = [0]
ipu_replicas = 4

2022-04-13 16:36:26,120 - mmcls - INFO - Set random seed to 212038955, deterministic: False
2022-04-13 16:36:27,208 - mmcls - INFO - initialize VisionTransformer with init_cfg {'type': 'Pretrained', 'checkpoint': 'https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth', 'prefix': 'backbone'}
2022-04-13 16:36:31,729 - mmcls - INFO - initialize VisionTransformerClsHead with init_cfg {'type': 'Constant', 'layer': 'Linear', 'val': 0}
Name of parameter - Initialization information

backbone.cls_token - torch.Size([1, 1, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.pos_embed - torch.Size([1, 197, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.patch_embed.projection.weight - torch.Size([768, 3, 16, 16]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.patch_embed.projection.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.0.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.1.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.2.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.3.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.4.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.5.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.6.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.7.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.8.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.9.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.10.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.attn.qkv.weight - torch.Size([2304, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.attn.qkv.bias - torch.Size([2304]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.attn.proj.weight - torch.Size([768, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.attn.proj.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ln2.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ln2.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ffn.layers.0.0.weight - torch.Size([3072, 768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ffn.layers.0.0.bias - torch.Size([3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ffn.layers.1.weight - torch.Size([768, 3072]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.layers.11.ffn.layers.1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.ln1.weight - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

backbone.ln1.bias - torch.Size([768]): 
PretrainedInit: load from https://download.openmmlab.com/mmclassification/v0/vit/pretrain/vit-base-p16_3rdparty_pt-64xb64_in1k-224_20210928-02284250.pth 

head.layers.head.weight - torch.Size([1000, 768]): 
ConstantInit: val=0, bias=0 

head.layers.head.bias - torch.Size([1000]): 
ConstantInit: val=0, bias=0 
2022-04-13 16:52:36,044 - mmcls - INFO - Start running, host: dihu@sgjur-pod002-3, work_dir: /localdata/cn-customer-engineering/hudi/mmclassification/work_dirs/vit-base-p16_ft-4xb544_in1k-224_ipu
2022-04-13 16:52:36,045 - mmcls - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) ipu_lr_hook_class                  
(NORMAL      ) CheckpointHook                     
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) ipu_lr_hook_class                  
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_train_iter:
(VERY_HIGH   ) ipu_lr_hook_class                  
(LOW         ) IterTimerHook                      
 -------------------- 
after_train_iter:
(ABOVE_NORMAL) IPUFp16OptimizerHook               
(NORMAL      ) CheckpointHook                     
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) CheckpointHook                     
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_epoch:
(LOW         ) IterTimerHook                      
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
before_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_iter:
(LOW         ) IterTimerHook                      
 -------------------- 
after_val_epoch:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
after_run:
(VERY_LOW    ) TextLoggerHook                     
 -------------------- 
2022-04-13 16:52:36,046 - mmcls - INFO - workflow: [('train', 1)], max: 5000 iters
2022-04-13 16:52:36,047 - mmcls - INFO - Checkpoints will be saved to /localdata/cn-customer-engineering/hudi/mmclassification/work_dirs/vit-base-p16_ft-4xb544_in1k-224_ipu by HardDiskBackend.
2022-04-13 17:02:18,342 - mmcls - INFO - Iter [100/5000]	lr: 1.129e-02, eta: 7:55:32, time: 5.823, data_time: 0.071, loss: 4.9751
2022-04-13 17:03:02,768 - mmcls - INFO - Iter [200/5000]	lr: 2.102e-02, eta: 4:10:41, time: 0.444, data_time: 0.107, loss: 1.7558
2022-04-13 17:03:47,201 - mmcls - INFO - Iter [300/5000]	lr: 3.063e-02, eta: 2:55:14, time: 0.444, data_time: 0.106, loss: 1.4765
2022-04-13 17:04:26,740 - mmcls - INFO - Iter [400/5000]	lr: 4.007e-02, eta: 2:16:12, time: 0.395, data_time: 0.060, loss: 1.4278
2022-04-13 17:05:03,549 - mmcls - INFO - Iter [500/5000]	lr: 4.927e-02, eta: 1:52:07, time: 0.368, data_time: 0.034, loss: 1.5029
2022-04-13 17:05:42,101 - mmcls - INFO - Iter [600/5000]	lr: 5.819e-02, eta: 1:36:04, time: 0.386, data_time: 0.051, loss: 1.5051
2022-04-13 17:06:18,627 - mmcls - INFO - Iter [700/5000]	lr: 6.678e-02, eta: 1:24:12, time: 0.365, data_time: 0.031, loss: 1.5077
2022-04-13 17:06:55,164 - mmcls - INFO - Iter [800/5000]	lr: 7.497e-02, eta: 1:15:10, time: 0.365, data_time: 0.031, loss: 1.5128
2022-04-13 17:07:31,699 - mmcls - INFO - Iter [900/5000]	lr: 7.379e-02, eta: 1:08:00, time: 0.365, data_time: 0.031, loss: 1.5276
2022-04-13 17:08:08,426 - mmcls - INFO - Saving checkpoint at 1000 iterations
2022-04-13 17:08:09,223 - mmcls - INFO - Iter [1000/5000]	lr: 7.238e-02, eta: 1:02:12, time: 0.375, data_time: 0.033, loss: 1.4408
2022-04-13 17:08:45,764 - mmcls - INFO - Iter [1100/5000]	lr: 7.084e-02, eta: 0:57:18, time: 0.365, data_time: 0.031, loss: 1.4149
2022-04-13 17:09:24,350 - mmcls - INFO - Iter [1200/5000]	lr: 6.918e-02, eta: 0:53:12, time: 0.386, data_time: 0.051, loss: 1.3670
2022-04-13 17:10:01,339 - mmcls - INFO - Iter [1300/5000]	lr: 6.740e-02, eta: 0:49:35, time: 0.370, data_time: 0.035, loss: 1.2952
2022-04-13 17:10:38,860 - mmcls - INFO - Iter [1400/5000]	lr: 6.552e-02, eta: 0:46:24, time: 0.375, data_time: 0.041, loss: 1.2759
2022-04-13 17:11:16,655 - mmcls - INFO - Iter [1500/5000]	lr: 6.353e-02, eta: 0:43:34, time: 0.378, data_time: 0.044, loss: 1.2387
2022-04-13 17:11:54,763 - mmcls - INFO - Iter [1600/5000]	lr: 6.145e-02, eta: 0:41:02, time: 0.381, data_time: 0.047, loss: 1.2119
2022-04-13 17:12:32,938 - mmcls - INFO - Iter [1700/5000]	lr: 5.929e-02, eta: 0:38:43, time: 0.382, data_time: 0.047, loss: 1.1715
2022-04-13 17:13:12,331 - mmcls - INFO - Iter [1800/5000]	lr: 5.705e-02, eta: 0:36:37, time: 0.394, data_time: 0.059, loss: 1.1356
2022-04-13 17:13:49,791 - mmcls - INFO - Iter [1900/5000]	lr: 5.475e-02, eta: 0:34:38, time: 0.375, data_time: 0.040, loss: 1.0733
2022-04-13 17:14:27,850 - mmcls - INFO - Saving checkpoint at 2000 iterations
2022-04-13 17:14:28,516 - mmcls - INFO - Iter [2000/5000]	lr: 5.238e-02, eta: 0:32:48, time: 0.387, data_time: 0.046, loss: 1.0650
2022-04-13 17:15:06,641 - mmcls - INFO - Iter [2100/5000]	lr: 4.997e-02, eta: 0:31:05, time: 0.381, data_time: 0.045, loss: 1.0690
2022-04-13 17:15:45,412 - mmcls - INFO - Iter [2200/5000]	lr: 4.752e-02, eta: 0:29:28, time: 0.388, data_time: 0.052, loss: 1.0750
2022-04-13 17:16:25,403 - mmcls - INFO - Iter [2300/5000]	lr: 4.504e-02, eta: 0:27:57, time: 0.400, data_time: 0.060, loss: 1.0298
2022-04-13 17:17:07,474 - mmcls - INFO - Iter [2400/5000]	lr: 4.254e-02, eta: 0:26:34, time: 0.421, data_time: 0.081, loss: 0.9902
2022-04-13 17:17:44,876 - mmcls - INFO - Iter [2500/5000]	lr: 4.003e-02, eta: 0:25:08, time: 0.374, data_time: 0.039, loss: 0.9331
2022-04-13 17:18:22,266 - mmcls - INFO - Iter [2600/5000]	lr: 3.751e-02, eta: 0:23:47, time: 0.374, data_time: 0.039, loss: 0.9368
2022-04-13 17:19:03,130 - mmcls - INFO - Iter [2700/5000]	lr: 3.501e-02, eta: 0:22:31, time: 0.409, data_time: 0.064, loss: 0.9308
2022-04-13 17:19:45,023 - mmcls - INFO - Iter [2800/5000]	lr: 3.253e-02, eta: 0:21:19, time: 0.419, data_time: 0.076, loss: 0.8999
2022-04-13 17:20:24,164 - mmcls - INFO - Iter [2900/5000]	lr: 3.008e-02, eta: 0:20:07, time: 0.391, data_time: 0.056, loss: 0.9042
2022-04-13 17:21:08,215 - mmcls - INFO - Saving checkpoint at 3000 iterations
2022-04-13 17:21:08,997 - mmcls - INFO - Iter [3000/5000]	lr: 2.766e-02, eta: 0:19:01, time: 0.448, data_time: 0.097, loss: 0.8575
2022-04-13 17:21:46,340 - mmcls - INFO - Iter [3100/5000]	lr: 2.530e-02, eta: 0:17:52, time: 0.373, data_time: 0.039, loss: 0.8599
2022-04-13 17:22:25,144 - mmcls - INFO - Iter [3200/5000]	lr: 2.299e-02, eta: 0:16:46, time: 0.388, data_time: 0.053, loss: 0.8356
2022-04-13 17:23:05,904 - mmcls - INFO - Iter [3300/5000]	lr: 2.075e-02, eta: 0:15:42, time: 0.408, data_time: 0.064, loss: 0.8158
2022-04-13 17:23:45,077 - mmcls - INFO - Iter [3400/5000]	lr: 1.859e-02, eta: 0:14:39, time: 0.392, data_time: 0.058, loss: 0.7750
2022-04-13 17:24:31,449 - mmcls - INFO - Iter [3500/5000]	lr: 1.651e-02, eta: 0:13:40, time: 0.464, data_time: 0.109, loss: 0.8225
2022-04-13 17:25:13,011 - mmcls - INFO - Iter [3600/5000]	lr: 1.452e-02, eta: 0:12:41, time: 0.416, data_time: 0.080, loss: 0.7546
2022-04-13 17:25:53,874 - mmcls - INFO - Iter [3700/5000]	lr: 1.264e-02, eta: 0:11:41, time: 0.409, data_time: 0.072, loss: 0.7133
2022-04-13 17:26:32,456 - mmcls - INFO - Iter [3800/5000]	lr: 1.086e-02, eta: 0:10:43, time: 0.386, data_time: 0.051, loss: 0.7266
2022-04-13 17:27:11,448 - mmcls - INFO - Iter [3900/5000]	lr: 9.195e-03, eta: 0:09:45, time: 0.390, data_time: 0.055, loss: 0.7177
2022-04-13 17:27:49,789 - mmcls - INFO - Saving checkpoint at 4000 iterations
2022-04-13 17:27:50,406 - mmcls - INFO - Iter [4000/5000]	lr: 7.654e-03, eta: 0:08:48, time: 0.390, data_time: 0.049, loss: 0.7267
2022-04-13 17:28:32,828 - mmcls - INFO - Iter [4100/5000]	lr: 6.240e-03, eta: 0:07:53, time: 0.424, data_time: 0.087, loss: 0.7084
2022-04-13 17:29:13,154 - mmcls - INFO - Iter [4200/5000]	lr: 4.960e-03, eta: 0:06:58, time: 0.403, data_time: 0.069, loss: 0.7003
2022-04-13 17:29:51,701 - mmcls - INFO - Iter [4300/5000]	lr: 3.818e-03, eta: 0:06:03, time: 0.385, data_time: 0.051, loss: 0.6764
2022-04-13 17:30:30,116 - mmcls - INFO - Iter [4400/5000]	lr: 2.818e-03, eta: 0:05:10, time: 0.384, data_time: 0.050, loss: 0.6887
2022-04-13 17:31:08,479 - mmcls - INFO - Iter [4500/5000]	lr: 1.966e-03, eta: 0:04:16, time: 0.384, data_time: 0.049, loss: 0.6532
2022-04-13 17:31:47,110 - mmcls - INFO - Iter [4600/5000]	lr: 1.263e-03, eta: 0:03:24, time: 0.386, data_time: 0.052, loss: 0.6630
2022-04-13 17:32:25,315 - mmcls - INFO - Iter [4700/5000]	lr: 7.132e-04, eta: 0:02:32, time: 0.382, data_time: 0.048, loss: 0.7071
2022-04-13 17:33:04,552 - mmcls - INFO - Iter [4800/5000]	lr: 3.186e-04, eta: 0:01:41, time: 0.392, data_time: 0.058, loss: 0.6766
2022-04-13 17:33:44,367 - mmcls - INFO - Iter [4900/5000]	lr: 8.052e-05, eta: 0:00:50, time: 0.398, data_time: 0.063, loss: 0.6759
2022-04-13 17:34:23,525 - mmcls - INFO - Saving checkpoint at 5000 iterations
2022-04-13 17:34:25,273 - mmcls - INFO - Iter [5000/5000]	lr: 7.896e-09, eta: 0:00:00, time: 0.409, data_time: 0.056, loss: 0.6554