2023/05/29 19:28:13 - mmengine - INFO - 
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.9.0 (default, Nov 15 2020, 14:28:56) [GCC 7.3.0]
    CUDA available: True
    numpy_random_seed: 23917156
    GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090
    CUDA_HOME: /usr/local/cuda
    NVCC: Cuda compilation tools, release 11.3, V11.3.109
    GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    PyTorch: 1.12.1+cu113
    PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.2
    - Built with CuDNN 8.3.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

    TorchVision: 0.13.1+cu102
    OpenCV: 4.7.0
    MMEngine: 0.7.3

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: None
    diff_rank_seed: False
    deterministic: False
    Distributed launcher: pytorch
    Distributed training: True
    GPU number: 32
------------------------------------------------------------

2023/05/29 19:28:14 - mmengine - INFO - Config:
model = dict(
    _scope_='mmrazor',
    type='SingleTeacherDistill',
    architecture=dict(
        cfg_path=
        'mmaction::recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb.py',
        backbone=dict(pretrained=False),
        pretrained=False),
    teacher=dict(
        cfg_path=
        'mmaction::recognition/tsn/custom_backbones/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb.py',
        pretrained=False),
    teacher_ckpt=
    'work_dirs/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb/best_acc_top1_epoch_47.pth',
    distiller=dict(
        type='ConfigurableDistiller',
        student_recorders=dict(
            logits=dict(type='ModuleOutputs', source='cls_head.fc_cls')),
        teacher_recorders=dict(
            logits=dict(type='ModuleOutputs', source='cls_head.fc_cls')),
        distill_losses=dict(
            loss_dist=dict(
                type='DISTLoss',
                inter_loss_weight=1.0,
                intra_loss_weight=1.0,
                tau=1,
                loss_weight=4)),
        loss_forward_mappings=dict(
            loss_dist=dict(
                logits_S=dict(from_student=True, recorder='logits'),
                logits_T=dict(from_student=False, recorder='logits')))))
train_cfg = dict(
    type='EpochBasedTrainLoop',
    max_epochs=100,
    val_begin=1,
    val_interval=1,
    _scope_='mmaction')
val_cfg = dict(type='mmrazor.SingleTeacherDistillValLoop')
test_cfg = dict(type='TestLoop', _scope_='mmaction')
param_scheduler = [
    dict(
        type='MultiStepLR',
        begin=0,
        end=100,
        by_epoch=True,
        milestones=[40, 80],
        gamma=0.1,
        _scope_='mmaction')
]
optim_wrapper = dict(
    optimizer=dict(
        type='SGD',
        lr=0.01,
        momentum=0.9,
        weight_decay=0.0001,
        _scope_='mmaction'),
    clip_grad=dict(max_norm=40, norm_type=2))
default_scope = 'mmaction'
default_hooks = dict(
    runtime_info=dict(type='RuntimeInfoHook', _scope_='mmaction'),
    timer=dict(type='IterTimerHook', _scope_='mmaction'),
    logger=dict(
        type='LoggerHook', interval=20, ignore_last=False, _scope_='mmaction'),
    param_scheduler=dict(type='ParamSchedulerHook', _scope_='mmaction'),
    checkpoint=dict(
        type='CheckpointHook',
        interval=3,
        save_best='auto',
        max_keep_ckpts=3,
        _scope_='mmaction'),
    sampler_seed=dict(type='DistSamplerSeedHook', _scope_='mmaction'),
    sync_buffers=dict(type='SyncBuffersHook', _scope_='mmaction'))
env_cfg = dict(
    cudnn_benchmark=False,
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
    dist_cfg=dict(backend='nccl'))
log_processor = dict(
    type='LogProcessor', window_size=20, by_epoch=True, _scope_='mmaction')
vis_backends = [dict(type='LocalVisBackend', _scope_='mmaction')]
visualizer = dict(
    type='ActionVisualizer',
    vis_backends=[dict(type='LocalVisBackend')],
    _scope_='mmaction')
log_level = 'INFO'
load_from = None
resume = True
dataset_type = 'VideoDataset'
data_root = 'data/kinetics400/videos_train'
data_root_val = 'data/kinetics400/videos_val'
ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt'
ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt'
file_client_args = dict(io_backend='disk')
train_pipeline = [
    dict(type='DecordInit', io_backend='disk', _scope_='mmaction'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        _scope_='mmaction'),
    dict(type='DecordDecode', _scope_='mmaction'),
    dict(type='Resize', scale=(-1, 256), _scope_='mmaction'),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1,
        _scope_='mmaction'),
    dict(
        type='Resize', scale=(224, 224), keep_ratio=False, _scope_='mmaction'),
    dict(type='Flip', flip_ratio=0.5, _scope_='mmaction'),
    dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'),
    dict(type='PackActionInputs', _scope_='mmaction')
]
val_pipeline = [
    dict(type='DecordInit', io_backend='disk', _scope_='mmaction'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=8,
        test_mode=True,
        _scope_='mmaction'),
    dict(type='DecordDecode', _scope_='mmaction'),
    dict(type='Resize', scale=(-1, 256), _scope_='mmaction'),
    dict(type='CenterCrop', crop_size=224, _scope_='mmaction'),
    dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'),
    dict(type='PackActionInputs', _scope_='mmaction')
]
test_pipeline = [
    dict(type='DecordInit', io_backend='disk', _scope_='mmaction'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=25,
        test_mode=True,
        _scope_='mmaction'),
    dict(type='DecordDecode', _scope_='mmaction'),
    dict(type='Resize', scale=(-1, 256), _scope_='mmaction'),
    dict(type='TenCrop', crop_size=224, _scope_='mmaction'),
    dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'),
    dict(type='PackActionInputs', _scope_='mmaction')
]
train_dataloader = dict(
    batch_size=8,
    num_workers=8,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=True, _scope_='mmaction'),
    dataset=dict(
        type='VideoDataset',
        ann_file='data/kinetics400/kinetics400_train_list_videos.txt',
        data_prefix=dict(video='data/kinetics400/videos_train'),
        pipeline=[
            dict(type='DecordInit', io_backend='disk'),
            dict(
                type='SampleFrames', clip_len=1, frame_interval=1,
                num_clips=8),
            dict(type='DecordDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(
                type='MultiScaleCrop',
                input_size=224,
                scales=(1, 0.875, 0.75, 0.66),
                random_crop=False,
                max_wh_scale_gap=1),
            dict(type='Resize', scale=(224, 224), keep_ratio=False),
            dict(type='Flip', flip_ratio=0.5),
            dict(type='FormatShape', input_format='NCHW'),
            dict(type='PackActionInputs')
        ],
        _scope_='mmaction'))
val_dataloader = dict(
    batch_size=8,
    num_workers=8,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=False, _scope_='mmaction'),
    dataset=dict(
        type='VideoDataset',
        ann_file='data/kinetics400/kinetics400_val_list_videos.txt',
        data_prefix=dict(video='data/kinetics400/videos_val'),
        pipeline=[
            dict(type='DecordInit', io_backend='disk'),
            dict(
                type='SampleFrames',
                clip_len=1,
                frame_interval=1,
                num_clips=8,
                test_mode=True),
            dict(type='DecordDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(type='CenterCrop', crop_size=224),
            dict(type='FormatShape', input_format='NCHW'),
            dict(type='PackActionInputs')
        ],
        test_mode=True,
        _scope_='mmaction'))
test_dataloader = dict(
    batch_size=1,
    num_workers=8,
    persistent_workers=True,
    sampler=dict(type='DefaultSampler', shuffle=False, _scope_='mmaction'),
    dataset=dict(
        type='VideoDataset',
        ann_file='data/kinetics400/kinetics400_val_list_videos.txt',
        data_prefix=dict(video='data/kinetics400/videos_val'),
        pipeline=[
            dict(type='DecordInit', io_backend='disk'),
            dict(
                type='SampleFrames',
                clip_len=1,
                frame_interval=1,
                num_clips=25,
                test_mode=True),
            dict(type='DecordDecode'),
            dict(type='Resize', scale=(-1, 256)),
            dict(type='TenCrop', crop_size=224),
            dict(type='FormatShape', input_format='NCHW'),
            dict(type='PackActionInputs')
        ],
        test_mode=True,
        _scope_='mmaction'))
val_evaluator = dict(type='AccMetric', _scope_='mmaction')
test_evaluator = dict(type='AccMetric', _scope_='mmaction')
auto_scale_lr = dict(enable=True, base_batch_size=256)
teacher_ckpt = 'work_dirs/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb/best_acc_top1_epoch_47.pth'
launcher = 'pytorch'
work_dir = './work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4'
randomness = dict(seed=None, diff_rank_seed=False, deterministic=False)

2023/05/29 19:28:22 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook                    
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
before_train:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(NORMAL      ) DistSamplerSeedHook                
 -------------------- 
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train_epoch:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) SyncBuffersHook                    
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_val_epoch:
(NORMAL      ) IterTimerHook                      
(NORMAL      ) SyncBuffersHook                    
 -------------------- 
before_val_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_val_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
(LOW         ) ParamSchedulerHook                 
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
after_train:
(VERY_LOW    ) CheckpointHook                     
 -------------------- 
before_test_epoch:
(NORMAL      ) IterTimerHook                      
 -------------------- 
before_test_iter:
(NORMAL      ) IterTimerHook                      
 -------------------- 
after_test_iter:
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook                    
(NORMAL      ) IterTimerHook                      
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
after_run:
(BELOW_NORMAL) LoggerHook                         
 -------------------- 
2023/05/29 19:28:23 - mmengine - INFO - LR is set based on batch size of 256 and the current batch size is 256. Scaling the original LR by 1.0.
2023/05/29 19:28:24 - mmengine - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'}
2023/05/29 19:28:24 - mmengine - WARNING - init_weights of Recognizer2D has been called more than once.
Name of parameter - Initialization information

architecture.backbone.conv1.conv.weight - torch.Size([64, 3, 7, 7]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.conv1.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.conv1.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv1.conv.weight - torch.Size([64, 64, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv1.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv1.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv2.conv.weight - torch.Size([64, 64, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv2.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv2.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv3.conv.weight - torch.Size([256, 64, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv3.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.conv3.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.downsample.conv.weight - torch.Size([256, 64, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.downsample.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.0.downsample.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv1.conv.weight - torch.Size([64, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv1.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv1.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv2.conv.weight - torch.Size([64, 64, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv2.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv2.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv3.conv.weight - torch.Size([256, 64, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv3.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.1.conv3.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv1.conv.weight - torch.Size([64, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv1.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv1.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv2.conv.weight - torch.Size([64, 64, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv2.bn.weight - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv2.bn.bias - torch.Size([64]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv3.conv.weight - torch.Size([256, 64, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv3.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer1.2.conv3.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv1.conv.weight - torch.Size([128, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv1.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv1.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv2.conv.weight - torch.Size([128, 128, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv2.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv2.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv3.conv.weight - torch.Size([512, 128, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv3.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.conv3.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.downsample.conv.weight - torch.Size([512, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.downsample.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.0.downsample.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv1.conv.weight - torch.Size([128, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv1.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv1.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv2.conv.weight - torch.Size([128, 128, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv2.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv2.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv3.conv.weight - torch.Size([512, 128, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv3.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.1.conv3.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv1.conv.weight - torch.Size([128, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv1.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv1.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv2.conv.weight - torch.Size([128, 128, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv2.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv2.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv3.conv.weight - torch.Size([512, 128, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv3.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.2.conv3.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv1.conv.weight - torch.Size([128, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv1.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv1.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv2.conv.weight - torch.Size([128, 128, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv2.bn.weight - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv2.bn.bias - torch.Size([128]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv3.conv.weight - torch.Size([512, 128, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv3.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer2.3.conv3.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv1.conv.weight - torch.Size([256, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.downsample.conv.weight - torch.Size([1024, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.downsample.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.0.downsample.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.1.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.2.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.3.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.4.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv1.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv1.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv2.conv.weight - torch.Size([256, 256, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv2.bn.weight - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv2.bn.bias - torch.Size([256]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv3.bn.weight - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer3.5.conv3.bn.bias - torch.Size([1024]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv1.conv.weight - torch.Size([512, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv1.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv1.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv2.conv.weight - torch.Size([512, 512, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv2.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv2.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv3.bn.weight - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.conv3.bn.bias - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.downsample.conv.weight - torch.Size([2048, 1024, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.downsample.bn.weight - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.0.downsample.bn.bias - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv1.conv.weight - torch.Size([512, 2048, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv1.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv1.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv2.conv.weight - torch.Size([512, 512, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv2.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv2.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv3.bn.weight - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.1.conv3.bn.bias - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv1.conv.weight - torch.Size([512, 2048, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv1.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv1.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv2.conv.weight - torch.Size([512, 512, 3, 3]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv2.bn.weight - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv2.bn.bias - torch.Size([512]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv3.bn.weight - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.backbone.layer4.2.conv3.bn.bias - torch.Size([2048]): 
Initialized by user-defined `init_weights` in ResNet  

architecture.cls_head.fc_cls.weight - torch.Size([400, 2048]): 
Initialized by user-defined `init_weights` in TSNHead  

architecture.cls_head.fc_cls.bias - torch.Size([400]): 
Initialized by user-defined `init_weights` in TSNHead  

teacher.backbone.patch_embed.proj.weight - torch.Size([128, 3, 4, 4]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.patch_embed.proj.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.patch_embed.norm.weight - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.patch_embed.norm.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.norm1.weight - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.norm1.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.attn.relative_position_bias_table - torch.Size([169, 4]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.attn.qkv.weight - torch.Size([384, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.attn.qkv.bias - torch.Size([384]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.attn.proj.weight - torch.Size([128, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.attn.proj.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.norm2.weight - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.norm2.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.mlp.fc1.weight - torch.Size([512, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.mlp.fc1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.mlp.fc2.weight - torch.Size([128, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.0.mlp.fc2.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.norm1.weight - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.norm1.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.attn.relative_position_bias_table - torch.Size([169, 4]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.attn.qkv.weight - torch.Size([384, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.attn.qkv.bias - torch.Size([384]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.attn.proj.weight - torch.Size([128, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.attn.proj.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.norm2.weight - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.norm2.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.mlp.fc1.weight - torch.Size([512, 128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.mlp.fc1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.mlp.fc2.weight - torch.Size([128, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.blocks.1.mlp.fc2.bias - torch.Size([128]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.downsample.norm.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.downsample.norm.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.0.downsample.reduction.weight - torch.Size([256, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.norm1.weight - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.norm1.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.attn.relative_position_bias_table - torch.Size([169, 8]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.attn.qkv.weight - torch.Size([768, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.attn.qkv.bias - torch.Size([768]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.attn.proj.weight - torch.Size([256, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.attn.proj.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.norm2.weight - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.norm2.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.mlp.fc1.weight - torch.Size([1024, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.mlp.fc1.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.mlp.fc2.weight - torch.Size([256, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.0.mlp.fc2.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.norm1.weight - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.norm1.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.attn.relative_position_bias_table - torch.Size([169, 8]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.attn.qkv.weight - torch.Size([768, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.attn.qkv.bias - torch.Size([768]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.attn.proj.weight - torch.Size([256, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.attn.proj.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.norm2.weight - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.norm2.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.mlp.fc1.weight - torch.Size([1024, 256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.mlp.fc1.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.mlp.fc2.weight - torch.Size([256, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.blocks.1.mlp.fc2.bias - torch.Size([256]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.downsample.norm.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.downsample.norm.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.1.downsample.reduction.weight - torch.Size([512, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.0.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.1.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.2.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.3.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.4.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.5.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.6.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.7.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.8.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.9.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.10.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.11.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.12.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.13.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.14.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.15.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.16.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.norm1.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.norm1.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.attn.relative_position_bias_table - torch.Size([169, 16]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.attn.qkv.weight - torch.Size([1536, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.attn.qkv.bias - torch.Size([1536]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.attn.proj.weight - torch.Size([512, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.attn.proj.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.norm2.weight - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.norm2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.mlp.fc1.weight - torch.Size([2048, 512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.mlp.fc1.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.mlp.fc2.weight - torch.Size([512, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.blocks.17.mlp.fc2.bias - torch.Size([512]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.downsample.norm.weight - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.downsample.norm.bias - torch.Size([2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.2.downsample.reduction.weight - torch.Size([1024, 2048]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.norm1.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.norm1.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.attn.relative_position_bias_table - torch.Size([169, 32]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.attn.qkv.weight - torch.Size([3072, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.attn.qkv.bias - torch.Size([3072]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.attn.proj.weight - torch.Size([1024, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.attn.proj.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.norm2.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.norm2.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.mlp.fc1.weight - torch.Size([4096, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.mlp.fc1.bias - torch.Size([4096]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.mlp.fc2.weight - torch.Size([1024, 4096]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.0.mlp.fc2.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.norm1.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.norm1.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.attn.relative_position_bias_table - torch.Size([169, 32]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.attn.qkv.weight - torch.Size([3072, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.attn.qkv.bias - torch.Size([3072]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.attn.proj.weight - torch.Size([1024, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.attn.proj.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.norm2.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.norm2.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.mlp.fc1.weight - torch.Size([4096, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.mlp.fc1.bias - torch.Size([4096]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.mlp.fc2.weight - torch.Size([1024, 4096]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.layers.3.blocks.1.mlp.fc2.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.norm.weight - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.backbone.norm.bias - torch.Size([1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.cls_head.fc_cls.weight - torch.Size([400, 1024]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  

teacher.cls_head.fc_cls.bias - torch.Size([400]): 
The value is the same before and after calling `init_weights` of SingleTeacherDistill  
2023/05/29 19:28:24 - mmengine - INFO - Auto resumed from the latest checkpoint /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/epoch_96.pth.
2023/05/29 19:28:24 - mmengine - INFO - Load checkpoint from /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/epoch_96.pth
2023/05/29 19:28:24 - mmengine - INFO - resumed epoch: 96, iter: 90240
2023/05/29 19:28:24 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
2023/05/29 19:28:24 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
2023/05/29 19:28:24 - mmengine - INFO - Checkpoints will be saved to /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4.
2023/05/29 19:28:41 - mmengine - INFO - Epoch(train)  [97][ 20/940]  lr: 1.0000e-04  eta: 0:51:18  time: 0.8230  data_time: 0.1149  memory: 6021  grad_norm: 6.5763  loss: 3.6471  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1103  distill.loss_dist: 2.5368
2023/05/29 19:28:49 - mmengine - INFO - Epoch(train)  [97][ 40/940]  lr: 1.0000e-04  eta: 0:38:09  time: 0.4079  data_time: 0.0076  memory: 6021  grad_norm: 6.6816  loss: 3.3565  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.0646  distill.loss_dist: 2.2918
2023/05/29 19:28:57 - mmengine - INFO - Epoch(train)  [97][ 60/940]  lr: 1.0000e-04  eta: 0:33:40  time: 0.4070  data_time: 0.0075  memory: 6021  grad_norm: 6.7736  loss: 3.6660  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.0820  distill.loss_dist: 2.5840
2023/05/29 19:29:05 - mmengine - INFO - Epoch(train)  [97][ 80/940]  lr: 1.0000e-04  eta: 0:31:13  time: 0.3986  data_time: 0.0072  memory: 6021  grad_norm: 6.6814  loss: 3.4649  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1160  distill.loss_dist: 2.3488
2023/05/29 19:29:13 - mmengine - INFO - Epoch(train)  [97][100/940]  lr: 1.0000e-04  eta: 0:29:48  time: 0.4067  data_time: 0.0073  memory: 6021  grad_norm: 6.7277  loss: 3.3342  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.7542  distill.loss_dist: 2.5800
2023/05/29 19:29:22 - mmengine - INFO - Epoch(train)  [97][120/940]  lr: 1.0000e-04  eta: 0:28:48  time: 0.4066  data_time: 0.0074  memory: 6021  grad_norm: 6.5996  loss: 3.3161  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 1.1498  distill.loss_dist: 2.1664
2023/05/29 19:29:30 - mmengine - INFO - Epoch(train)  [97][140/940]  lr: 1.0000e-04  eta: 0:28:03  time: 0.4058  data_time: 0.0074  memory: 6021  grad_norm: 6.5602  loss: 3.5661  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0614  distill.loss_dist: 2.5047
2023/05/29 19:29:38 - mmengine - INFO - Epoch(train)  [97][160/940]  lr: 1.0000e-04  eta: 0:27:25  time: 0.4007  data_time: 0.0077  memory: 6021  grad_norm: 6.8317  loss: 3.4784  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0921  distill.loss_dist: 2.3862
2023/05/29 19:29:46 - mmengine - INFO - Epoch(train)  [97][180/940]  lr: 1.0000e-04  eta: 0:26:56  time: 0.4083  data_time: 0.0075  memory: 6021  grad_norm: 6.7810  loss: 3.3169  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0456  distill.loss_dist: 2.2714
2023/05/29 19:29:54 - mmengine - INFO - Epoch(train)  [97][200/940]  lr: 1.0000e-04  eta: 0:26:32  time: 0.4081  data_time: 0.0074  memory: 6021  grad_norm: 6.7650  loss: 3.3738  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0281  distill.loss_dist: 2.3457
2023/05/29 19:30:02 - mmengine - INFO - Epoch(train)  [97][220/940]  lr: 1.0000e-04  eta: 0:26:08  time: 0.4002  data_time: 0.0075  memory: 6021  grad_norm: 6.6280  loss: 3.6571  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.1640  distill.loss_dist: 2.4931
2023/05/29 19:30:10 - mmengine - INFO - Epoch(train)  [97][240/940]  lr: 1.0000e-04  eta: 0:25:49  time: 0.4085  data_time: 0.0072  memory: 6021  grad_norm: 6.7731  loss: 3.0371  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 0.7885  distill.loss_dist: 2.2486
2023/05/29 19:30:18 - mmengine - INFO - Epoch(train)  [97][260/940]  lr: 1.0000e-04  eta: 0:25:30  time: 0.4021  data_time: 0.0076  memory: 6021  grad_norm: 6.8296  loss: 3.2197  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.8845  distill.loss_dist: 2.3352
2023/05/29 19:30:26 - mmengine - INFO - Epoch(train)  [97][280/940]  lr: 1.0000e-04  eta: 0:25:12  time: 0.4016  data_time: 0.0080  memory: 6021  grad_norm: 6.5911  loss: 3.2333  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 0.9141  distill.loss_dist: 2.3193
2023/05/29 19:30:34 - mmengine - INFO - Epoch(train)  [97][300/940]  lr: 1.0000e-04  eta: 0:24:58  time: 0.4106  data_time: 0.0072  memory: 6021  grad_norm: 6.9362  loss: 3.5899  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 1.0922  distill.loss_dist: 2.4978
2023/05/29 19:30:43 - mmengine - INFO - Epoch(train)  [97][320/940]  lr: 1.0000e-04  eta: 0:24:43  time: 0.4022  data_time: 0.0072  memory: 6021  grad_norm: 6.5766  loss: 3.3917  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 0.8606  distill.loss_dist: 2.5310
2023/05/29 19:30:51 - mmengine - INFO - Epoch(train)  [97][340/940]  lr: 1.0000e-04  eta: 0:24:30  time: 0.4115  data_time: 0.0076  memory: 6021  grad_norm: 6.5896  loss: 3.5896  student.top1_acc: 0.5000  student.top5_acc: 0.8750  student.loss_cls: 1.1493  distill.loss_dist: 2.4403
2023/05/29 19:30:59 - mmengine - INFO - Epoch(train)  [97][360/940]  lr: 1.0000e-04  eta: 0:24:18  time: 0.4110  data_time: 0.0088  memory: 6021  grad_norm: 6.6122  loss: 3.6441  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0575  distill.loss_dist: 2.5866
2023/05/29 19:31:07 - mmengine - INFO - Epoch(train)  [97][380/940]  lr: 1.0000e-04  eta: 0:24:04  time: 0.4021  data_time: 0.0078  memory: 6021  grad_norm: 6.6822  loss: 3.4523  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 0.9739  distill.loss_dist: 2.4784
2023/05/29 19:31:15 - mmengine - INFO - Epoch(train)  [97][400/940]  lr: 1.0000e-04  eta: 0:23:52  time: 0.4056  data_time: 0.0085  memory: 6021  grad_norm: 6.9727  loss: 3.7741  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1899  distill.loss_dist: 2.5842
2023/05/29 19:31:23 - mmengine - INFO - Epoch(train)  [97][420/940]  lr: 1.0000e-04  eta: 0:23:41  time: 0.4088  data_time: 0.0077  memory: 6021  grad_norm: 6.5663  loss: 3.3281  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0483  distill.loss_dist: 2.2798
2023/05/29 19:31:31 - mmengine - INFO - Epoch(train)  [97][440/940]  lr: 1.0000e-04  eta: 0:23:29  time: 0.4008  data_time: 0.0078  memory: 6021  grad_norm: 6.7992  loss: 3.3939  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1602  distill.loss_dist: 2.2338
2023/05/29 19:31:39 - mmengine - INFO - Epoch(train)  [97][460/940]  lr: 1.0000e-04  eta: 0:23:17  time: 0.4019  data_time: 0.0076  memory: 6021  grad_norm: 6.7564  loss: 3.5182  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1412  distill.loss_dist: 2.3770
2023/05/29 19:31:47 - mmengine - INFO - Epoch(train)  [97][480/940]  lr: 1.0000e-04  eta: 0:23:05  time: 0.4014  data_time: 0.0075  memory: 6021  grad_norm: 6.6893  loss: 3.3538  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1497  distill.loss_dist: 2.2041
2023/05/29 19:31:55 - mmengine - INFO - Epoch(train)  [97][500/940]  lr: 1.0000e-04  eta: 0:22:54  time: 0.4018  data_time: 0.0076  memory: 6021  grad_norm: 6.5555  loss: 3.5464  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 0.9375  distill.loss_dist: 2.6089
2023/05/29 19:32:04 - mmengine - INFO - Epoch(train)  [97][520/940]  lr: 1.0000e-04  eta: 0:22:44  time: 0.4087  data_time: 0.0074  memory: 6021  grad_norm: 6.6423  loss: 3.1308  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.8301  distill.loss_dist: 2.3008
2023/05/29 19:32:12 - mmengine - INFO - Epoch(train)  [97][540/940]  lr: 1.0000e-04  eta: 0:22:34  time: 0.4094  data_time: 0.0076  memory: 6021  grad_norm: 6.6711  loss: 3.4939  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0330  distill.loss_dist: 2.4609
2023/05/29 19:32:20 - mmengine - INFO - Epoch(train)  [97][560/940]  lr: 1.0000e-04  eta: 0:22:24  time: 0.4016  data_time: 0.0079  memory: 6021  grad_norm: 6.6909  loss: 3.7175  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.2732  distill.loss_dist: 2.4443
2023/05/29 19:32:28 - mmengine - INFO - Epoch(train)  [97][580/940]  lr: 1.0000e-04  eta: 0:22:13  time: 0.4005  data_time: 0.0076  memory: 6021  grad_norm: 6.4537  loss: 3.3739  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0645  distill.loss_dist: 2.3095
2023/05/29 19:32:36 - mmengine - INFO - Epoch(train)  [97][600/940]  lr: 1.0000e-04  eta: 0:22:03  time: 0.4012  data_time: 0.0076  memory: 6021  grad_norm: 6.7339  loss: 3.4291  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.1228  distill.loss_dist: 2.3063
2023/05/29 19:32:44 - mmengine - INFO - Epoch(train)  [97][620/940]  lr: 1.0000e-04  eta: 0:21:53  time: 0.4005  data_time: 0.0077  memory: 6021  grad_norm: 6.7564  loss: 3.5098  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1025  distill.loss_dist: 2.4073
2023/05/29 19:32:52 - mmengine - INFO - Epoch(train)  [97][640/940]  lr: 1.0000e-04  eta: 0:21:43  time: 0.4011  data_time: 0.0076  memory: 6021  grad_norm: 6.6798  loss: 3.5989  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1539  distill.loss_dist: 2.4451
2023/05/29 19:33:00 - mmengine - INFO - Epoch(train)  [97][660/940]  lr: 1.0000e-04  eta: 0:21:33  time: 0.4082  data_time: 0.0076  memory: 6021  grad_norm: 6.6446  loss: 3.5282  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0545  distill.loss_dist: 2.4737
2023/05/29 19:33:08 - mmengine - INFO - Epoch(train)  [97][680/940]  lr: 1.0000e-04  eta: 0:21:24  time: 0.4003  data_time: 0.0077  memory: 6021  grad_norm: 6.7553  loss: 3.5235  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 0.9646  distill.loss_dist: 2.5590
2023/05/29 19:33:16 - mmengine - INFO - Epoch(train)  [97][700/940]  lr: 1.0000e-04  eta: 0:21:14  time: 0.4005  data_time: 0.0079  memory: 6021  grad_norm: 6.7076  loss: 3.9051  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.2818  distill.loss_dist: 2.6233
2023/05/29 19:33:24 - mmengine - INFO - Epoch(train)  [97][720/940]  lr: 1.0000e-04  eta: 0:21:05  time: 0.4140  data_time: 0.0080  memory: 6021  grad_norm: 6.7916  loss: 3.3596  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 0.9876  distill.loss_dist: 2.3720
2023/05/29 19:33:33 - mmengine - INFO - Epoch(train)  [97][740/940]  lr: 1.0000e-04  eta: 0:20:56  time: 0.4078  data_time: 0.0077  memory: 6021  grad_norm: 6.6358  loss: 3.4840  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.2025  distill.loss_dist: 2.2814
2023/05/29 19:33:41 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:33:41 - mmengine - INFO - Epoch(train)  [97][760/940]  lr: 1.0000e-04  eta: 0:20:47  time: 0.3987  data_time: 0.0076  memory: 6021  grad_norm: 6.7344  loss: 3.6627  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.2234  distill.loss_dist: 2.4393
2023/05/29 19:33:49 - mmengine - INFO - Epoch(train)  [97][780/940]  lr: 1.0000e-04  eta: 0:20:38  time: 0.4079  data_time: 0.0074  memory: 6021  grad_norm: 6.6079  loss: 3.2952  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.7731  distill.loss_dist: 2.5221
2023/05/29 19:33:57 - mmengine - INFO - Epoch(train)  [97][800/940]  lr: 1.0000e-04  eta: 0:20:29  time: 0.4064  data_time: 0.0076  memory: 6021  grad_norm: 6.6835  loss: 3.6692  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1446  distill.loss_dist: 2.5246
2023/05/29 19:34:05 - mmengine - INFO - Epoch(train)  [97][820/940]  lr: 1.0000e-04  eta: 0:20:19  time: 0.4019  data_time: 0.0071  memory: 6021  grad_norm: 6.8290  loss: 3.7493  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1370  distill.loss_dist: 2.6123
2023/05/29 19:34:13 - mmengine - INFO - Epoch(train)  [97][840/940]  lr: 1.0000e-04  eta: 0:20:11  time: 0.4088  data_time: 0.0077  memory: 6021  grad_norm: 6.5624  loss: 3.4935  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1180  distill.loss_dist: 2.3754
2023/05/29 19:34:21 - mmengine - INFO - Epoch(train)  [97][860/940]  lr: 1.0000e-04  eta: 0:20:02  time: 0.4075  data_time: 0.0076  memory: 6021  grad_norm: 6.6359  loss: 3.6196  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.1408  distill.loss_dist: 2.4788
2023/05/29 19:34:29 - mmengine - INFO - Epoch(train)  [97][880/940]  lr: 1.0000e-04  eta: 0:19:53  time: 0.4015  data_time: 0.0078  memory: 6021  grad_norm: 6.8319  loss: 3.6201  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0598  distill.loss_dist: 2.5603
2023/05/29 19:34:37 - mmengine - INFO - Epoch(train)  [97][900/940]  lr: 1.0000e-04  eta: 0:19:44  time: 0.4017  data_time: 0.0078  memory: 6021  grad_norm: 6.6837  loss: 3.7207  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.1297  distill.loss_dist: 2.5910
2023/05/29 19:34:45 - mmengine - INFO - Epoch(train)  [97][920/940]  lr: 1.0000e-04  eta: 0:19:34  time: 0.4005  data_time: 0.0079  memory: 6021  grad_norm: 6.5485  loss: 3.5911  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1783  distill.loss_dist: 2.4127
2023/05/29 19:34:53 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:34:53 - mmengine - INFO - Epoch(train)  [97][940/940]  lr: 1.0000e-04  eta: 0:19:24  time: 0.3813  data_time: 0.0069  memory: 6021  grad_norm: 7.0299  loss: 3.6984  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.1335  distill.loss_dist: 2.5649
2023/05/29 19:35:01 - mmengine - INFO - Epoch(val)  [97][20/78]    eta: 0:00:22  time: 0.3823  data_time: 0.3298  memory: 1444  
2023/05/29 19:35:05 - mmengine - INFO - Epoch(val)  [97][40/78]    eta: 0:00:11  time: 0.2025  data_time: 0.1509  memory: 1444  
2023/05/29 19:35:11 - mmengine - INFO - Epoch(val)  [97][60/78]    eta: 0:00:05  time: 0.2924  data_time: 0.2409  memory: 1444  
2023/05/29 19:35:22 - mmengine - INFO - Epoch(val)  [97][20/78]    eta: 0:01:19  time: 0.3425  data_time: 0.1587  memory: 2227  
2023/05/29 19:35:27 - mmengine - INFO - Epoch(val)  [97][40/78]    eta: 0:00:31  time: 0.2619  data_time: 0.0781  memory: 2227  
2023/05/29 19:35:33 - mmengine - INFO - Epoch(val)  [97][60/78]    eta: 0:00:11  time: 0.2719  data_time: 0.0879  memory: 2227  
2023/05/29 19:35:38 - mmengine - INFO - Epoch(val) [97][78/78]    acc/top1: 0.7303  acc/top5: 0.9076  acc/mean1: 0.7302  teacher.acc/top1: 0.7727  teacher.acc/top5: 0.9298  teacher.acc/mean1: 0.7726  data_time: 0.0839  time: 0.2641
2023/05/29 19:35:48 - mmengine - INFO - Epoch(train)  [98][ 20/940]  lr: 1.0000e-04  eta: 0:19:22  time: 0.5179  data_time: 0.0843  memory: 6021  grad_norm: 6.6476  loss: 3.4853  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1274  distill.loss_dist: 2.3579
2023/05/29 19:35:56 - mmengine - INFO - Epoch(train)  [98][ 40/940]  lr: 1.0000e-04  eta: 0:19:13  time: 0.4002  data_time: 0.0075  memory: 6021  grad_norm: 6.5761  loss: 3.5652  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.3698  distill.loss_dist: 2.1954
2023/05/29 19:36:04 - mmengine - INFO - Epoch(train)  [98][ 60/940]  lr: 1.0000e-04  eta: 0:19:04  time: 0.4011  data_time: 0.0076  memory: 6021  grad_norm: 6.6800  loss: 3.2708  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1237  distill.loss_dist: 2.1471
2023/05/29 19:36:12 - mmengine - INFO - Epoch(train)  [98][ 80/940]  lr: 1.0000e-04  eta: 0:18:55  time: 0.4029  data_time: 0.0076  memory: 6021  grad_norm: 6.7007  loss: 3.0832  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.7699  distill.loss_dist: 2.3133
2023/05/29 19:36:20 - mmengine - INFO - Epoch(train)  [98][100/940]  lr: 1.0000e-04  eta: 0:18:46  time: 0.4050  data_time: 0.0075  memory: 6021  grad_norm: 6.9197  loss: 3.7876  student.top1_acc: 0.3750  student.top5_acc: 0.5000  student.loss_cls: 1.1995  distill.loss_dist: 2.5881
2023/05/29 19:36:29 - mmengine - INFO - Epoch(train)  [98][120/940]  lr: 1.0000e-04  eta: 0:18:38  time: 0.4084  data_time: 0.0083  memory: 6021  grad_norm: 6.7849  loss: 3.5438  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.2907  distill.loss_dist: 2.2532
2023/05/29 19:36:37 - mmengine - INFO - Epoch(train)  [98][140/940]  lr: 1.0000e-04  eta: 0:18:29  time: 0.4062  data_time: 0.0076  memory: 6021  grad_norm: 6.6913  loss: 3.5529  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.2141  distill.loss_dist: 2.3388
2023/05/29 19:36:45 - mmengine - INFO - Epoch(train)  [98][160/940]  lr: 1.0000e-04  eta: 0:18:20  time: 0.4072  data_time: 0.0077  memory: 6021  grad_norm: 6.5471  loss: 3.6248  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.0056  distill.loss_dist: 2.6191
2023/05/29 19:36:53 - mmengine - INFO - Epoch(train)  [98][180/940]  lr: 1.0000e-04  eta: 0:18:12  time: 0.4104  data_time: 0.0078  memory: 6021  grad_norm: 6.5732  loss: 3.6400  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.2101  distill.loss_dist: 2.4300
2023/05/29 19:37:01 - mmengine - INFO - Epoch(train)  [98][200/940]  lr: 1.0000e-04  eta: 0:18:03  time: 0.4008  data_time: 0.0074  memory: 6021  grad_norm: 6.5725  loss: 3.3579  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.2280  distill.loss_dist: 2.1298
2023/05/29 19:37:09 - mmengine - INFO - Epoch(train)  [98][220/940]  lr: 1.0000e-04  eta: 0:17:54  time: 0.4050  data_time: 0.0082  memory: 6021  grad_norm: 6.4721  loss: 3.5251  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1121  distill.loss_dist: 2.4130
2023/05/29 19:37:17 - mmengine - INFO - Epoch(train)  [98][240/940]  lr: 1.0000e-04  eta: 0:17:46  time: 0.4087  data_time: 0.0077  memory: 6021  grad_norm: 6.6136  loss: 3.1402  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.9086  distill.loss_dist: 2.2316
2023/05/29 19:37:25 - mmengine - INFO - Epoch(train)  [98][260/940]  lr: 1.0000e-04  eta: 0:17:37  time: 0.3989  data_time: 0.0077  memory: 6021  grad_norm: 6.6526  loss: 3.5407  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1105  distill.loss_dist: 2.4302
2023/05/29 19:37:34 - mmengine - INFO - Epoch(train)  [98][280/940]  lr: 1.0000e-04  eta: 0:17:29  time: 0.4074  data_time: 0.0080  memory: 6021  grad_norm: 6.7712  loss: 3.7253  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1347  distill.loss_dist: 2.5906
2023/05/29 19:37:42 - mmengine - INFO - Epoch(train)  [98][300/940]  lr: 1.0000e-04  eta: 0:17:20  time: 0.4098  data_time: 0.0075  memory: 6021  grad_norm: 6.6427  loss: 3.4748  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.0991  distill.loss_dist: 2.3757
2023/05/29 19:37:50 - mmengine - INFO - Epoch(train)  [98][320/940]  lr: 1.0000e-04  eta: 0:17:12  time: 0.4064  data_time: 0.0080  memory: 6021  grad_norm: 6.8392  loss: 3.4791  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1006  distill.loss_dist: 2.3785
2023/05/29 19:37:58 - mmengine - INFO - Epoch(train)  [98][340/940]  lr: 1.0000e-04  eta: 0:17:03  time: 0.4102  data_time: 0.0084  memory: 6021  grad_norm: 6.7068  loss: 3.6548  student.top1_acc: 0.3750  student.top5_acc: 0.7500  student.loss_cls: 1.1245  distill.loss_dist: 2.5303
2023/05/29 19:38:06 - mmengine - INFO - Epoch(train)  [98][360/940]  lr: 1.0000e-04  eta: 0:16:55  time: 0.4011  data_time: 0.0083  memory: 6021  grad_norm: 6.7447  loss: 4.0055  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.1639  distill.loss_dist: 2.8416
2023/05/29 19:38:14 - mmengine - INFO - Epoch(train)  [98][380/940]  lr: 1.0000e-04  eta: 0:16:46  time: 0.4060  data_time: 0.0077  memory: 6021  grad_norm: 6.5377  loss: 3.2911  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.0291  distill.loss_dist: 2.2620
2023/05/29 19:38:22 - mmengine - INFO - Epoch(train)  [98][400/940]  lr: 1.0000e-04  eta: 0:16:38  time: 0.4084  data_time: 0.0081  memory: 6021  grad_norm: 6.7597  loss: 3.6150  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1062  distill.loss_dist: 2.5088
2023/05/29 19:38:31 - mmengine - INFO - Epoch(train)  [98][420/940]  lr: 1.0000e-04  eta: 0:16:29  time: 0.4087  data_time: 0.0076  memory: 6021  grad_norm: 6.5082  loss: 3.6350  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1374  distill.loss_dist: 2.4976
2023/05/29 19:38:39 - mmengine - INFO - Epoch(train)  [98][440/940]  lr: 1.0000e-04  eta: 0:16:21  time: 0.4084  data_time: 0.0075  memory: 6021  grad_norm: 6.7288  loss: 3.4664  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.1111  distill.loss_dist: 2.3553
2023/05/29 19:38:47 - mmengine - INFO - Epoch(train)  [98][460/940]  lr: 1.0000e-04  eta: 0:16:12  time: 0.4014  data_time: 0.0077  memory: 6021  grad_norm: 6.5957  loss: 3.2907  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0395  distill.loss_dist: 2.2512
2023/05/29 19:38:55 - mmengine - INFO - Epoch(train)  [98][480/940]  lr: 1.0000e-04  eta: 0:16:04  time: 0.4012  data_time: 0.0072  memory: 6021  grad_norm: 6.7199  loss: 3.7779  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1209  distill.loss_dist: 2.6569
2023/05/29 19:39:03 - mmengine - INFO - Epoch(train)  [98][500/940]  lr: 1.0000e-04  eta: 0:15:55  time: 0.4093  data_time: 0.0074  memory: 6021  grad_norm: 6.8476  loss: 3.4980  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.2118  distill.loss_dist: 2.2862
2023/05/29 19:39:11 - mmengine - INFO - Epoch(train)  [98][520/940]  lr: 1.0000e-04  eta: 0:15:47  time: 0.4020  data_time: 0.0076  memory: 6021  grad_norm: 6.6238  loss: 3.6866  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.3669  distill.loss_dist: 2.3197
2023/05/29 19:39:19 - mmengine - INFO - Epoch(train)  [98][540/940]  lr: 1.0000e-04  eta: 0:15:38  time: 0.4079  data_time: 0.0077  memory: 6021  grad_norm: 6.3276  loss: 3.4668  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.0989  distill.loss_dist: 2.3679
2023/05/29 19:39:27 - mmengine - INFO - Epoch(train)  [98][560/940]  lr: 1.0000e-04  eta: 0:15:30  time: 0.4103  data_time: 0.0080  memory: 6021  grad_norm: 6.6249  loss: 3.2230  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0226  distill.loss_dist: 2.2004
2023/05/29 19:39:36 - mmengine - INFO - Epoch(train)  [98][580/940]  lr: 1.0000e-04  eta: 0:15:22  time: 0.4031  data_time: 0.0075  memory: 6021  grad_norm: 6.4539  loss: 3.0269  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.9367  distill.loss_dist: 2.0902
2023/05/29 19:39:44 - mmengine - INFO - Epoch(train)  [98][600/940]  lr: 1.0000e-04  eta: 0:15:13  time: 0.4018  data_time: 0.0074  memory: 6021  grad_norm: 6.6116  loss: 3.6049  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0690  distill.loss_dist: 2.5359
2023/05/29 19:39:52 - mmengine - INFO - Epoch(train)  [98][620/940]  lr: 1.0000e-04  eta: 0:15:05  time: 0.4037  data_time: 0.0073  memory: 6021  grad_norm: 6.6852  loss: 3.4575  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9746  distill.loss_dist: 2.4829
2023/05/29 19:40:00 - mmengine - INFO - Epoch(train)  [98][640/940]  lr: 1.0000e-04  eta: 0:14:56  time: 0.4071  data_time: 0.0079  memory: 6021  grad_norm: 6.6771  loss: 3.3604  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0005  distill.loss_dist: 2.3599
2023/05/29 19:40:08 - mmengine - INFO - Epoch(train)  [98][660/940]  lr: 1.0000e-04  eta: 0:14:48  time: 0.4013  data_time: 0.0070  memory: 6021  grad_norm: 6.5437  loss: 3.4247  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1717  distill.loss_dist: 2.2530
2023/05/29 19:40:16 - mmengine - INFO - Epoch(train)  [98][680/940]  lr: 1.0000e-04  eta: 0:14:39  time: 0.4010  data_time: 0.0075  memory: 6021  grad_norm: 6.7671  loss: 3.5452  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.3275  distill.loss_dist: 2.2177
2023/05/29 19:40:24 - mmengine - INFO - Epoch(train)  [98][700/940]  lr: 1.0000e-04  eta: 0:14:31  time: 0.4102  data_time: 0.0075  memory: 6021  grad_norm: 6.6439  loss: 3.3310  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0230  distill.loss_dist: 2.3080
2023/05/29 19:40:32 - mmengine - INFO - Epoch(train)  [98][720/940]  lr: 1.0000e-04  eta: 0:14:23  time: 0.4011  data_time: 0.0073  memory: 6021  grad_norm: 6.6304  loss: 3.4180  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0487  distill.loss_dist: 2.3692
2023/05/29 19:40:40 - mmengine - INFO - Epoch(train)  [98][740/940]  lr: 1.0000e-04  eta: 0:14:14  time: 0.4014  data_time: 0.0070  memory: 6021  grad_norm: 6.6456  loss: 3.4011  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0667  distill.loss_dist: 2.3344
2023/05/29 19:40:48 - mmengine - INFO - Epoch(train)  [98][760/940]  lr: 1.0000e-04  eta: 0:14:06  time: 0.4087  data_time: 0.0072  memory: 6021  grad_norm: 6.6794  loss: 3.4682  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0947  distill.loss_dist: 2.3735
2023/05/29 19:40:56 - mmengine - INFO - Epoch(train)  [98][780/940]  lr: 1.0000e-04  eta: 0:13:58  time: 0.4076  data_time: 0.0076  memory: 6021  grad_norm: 6.8405  loss: 3.3556  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 0.8674  distill.loss_dist: 2.4882
2023/05/29 19:41:05 - mmengine - INFO - Epoch(train)  [98][800/940]  lr: 1.0000e-04  eta: 0:13:49  time: 0.4063  data_time: 0.0076  memory: 6021  grad_norm: 6.6922  loss: 3.3954  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.2031  distill.loss_dist: 2.1924
2023/05/29 19:41:13 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:41:13 - mmengine - INFO - Epoch(train)  [98][820/940]  lr: 1.0000e-04  eta: 0:13:41  time: 0.4022  data_time: 0.0074  memory: 6021  grad_norm: 6.7688  loss: 3.4639  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.2353  distill.loss_dist: 2.2286
2023/05/29 19:41:21 - mmengine - INFO - Epoch(train)  [98][840/940]  lr: 1.0000e-04  eta: 0:13:33  time: 0.4098  data_time: 0.0077  memory: 6021  grad_norm: 6.7706  loss: 3.3413  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 0.9879  distill.loss_dist: 2.3535
2023/05/29 19:41:29 - mmengine - INFO - Epoch(train)  [98][860/940]  lr: 1.0000e-04  eta: 0:13:24  time: 0.4013  data_time: 0.0072  memory: 6021  grad_norm: 6.8269  loss: 3.4975  student.top1_acc: 0.3750  student.top5_acc: 0.8750  student.loss_cls: 1.2236  distill.loss_dist: 2.2739
2023/05/29 19:41:37 - mmengine - INFO - Epoch(train)  [98][880/940]  lr: 1.0000e-04  eta: 0:13:16  time: 0.4016  data_time: 0.0076  memory: 6021  grad_norm: 6.6681  loss: 3.5887  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1269  distill.loss_dist: 2.4618
2023/05/29 19:41:45 - mmengine - INFO - Epoch(train)  [98][900/940]  lr: 1.0000e-04  eta: 0:13:07  time: 0.4008  data_time: 0.0075  memory: 6021  grad_norm: 6.7021  loss: 3.6018  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9680  distill.loss_dist: 2.6339
2023/05/29 19:41:53 - mmengine - INFO - Epoch(train)  [98][920/940]  lr: 1.0000e-04  eta: 0:12:59  time: 0.4005  data_time: 0.0077  memory: 6021  grad_norm: 6.7226  loss: 3.3504  student.top1_acc: 0.5000  student.top5_acc: 0.8750  student.loss_cls: 1.1069  distill.loss_dist: 2.2435
2023/05/29 19:42:01 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:42:01 - mmengine - INFO - Epoch(train)  [98][940/940]  lr: 1.0000e-04  eta: 0:12:50  time: 0.3915  data_time: 0.0069  memory: 6021  grad_norm: 7.2279  loss: 3.7729  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0544  distill.loss_dist: 2.7185
2023/05/29 19:42:07 - mmengine - INFO - Epoch(val)  [98][20/78]    eta: 0:00:18  time: 0.3256  data_time: 0.2730  memory: 1444  
2023/05/29 19:42:12 - mmengine - INFO - Epoch(val)  [98][40/78]    eta: 0:00:10  time: 0.2108  data_time: 0.1590  memory: 1444  
2023/05/29 19:42:17 - mmengine - INFO - Epoch(val)  [98][60/78]    eta: 0:00:04  time: 0.2713  data_time: 0.2198  memory: 1444  
2023/05/29 19:42:29 - mmengine - INFO - Epoch(val)  [98][20/78]    eta: 0:01:15  time: 0.3468  data_time: 0.1631  memory: 2227  
2023/05/29 19:42:34 - mmengine - INFO - Epoch(val)  [98][40/78]    eta: 0:00:29  time: 0.2408  data_time: 0.0564  memory: 2227  
2023/05/29 19:42:39 - mmengine - INFO - Epoch(val)  [98][60/78]    eta: 0:00:10  time: 0.2511  data_time: 0.0671  memory: 2227  
2023/05/29 19:42:44 - mmengine - INFO - Epoch(val) [98][78/78]    acc/top1: 0.7301  acc/top5: 0.9080  acc/mean1: 0.7300  teacher.acc/top1: 0.7727  teacher.acc/top5: 0.9298  teacher.acc/mean1: 0.7726  data_time: 0.0780  time: 0.2586
2023/05/29 19:42:54 - mmengine - INFO - Epoch(train)  [99][ 20/940]  lr: 1.0000e-04  eta: 0:12:44  time: 0.5162  data_time: 0.0625  memory: 6021  grad_norm: 6.5578  loss: 3.3237  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.2855  distill.loss_dist: 2.0382
2023/05/29 19:43:03 - mmengine - INFO - Epoch(train)  [99][ 40/940]  lr: 1.0000e-04  eta: 0:12:36  time: 0.4095  data_time: 0.0076  memory: 6021  grad_norm: 6.8310  loss: 3.4588  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.9362  distill.loss_dist: 2.5226
2023/05/29 19:43:11 - mmengine - INFO - Epoch(train)  [99][ 60/940]  lr: 1.0000e-04  eta: 0:12:28  time: 0.4021  data_time: 0.0077  memory: 6021  grad_norm: 6.7432  loss: 3.6601  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.2053  distill.loss_dist: 2.4548
2023/05/29 19:43:19 - mmengine - INFO - Epoch(train)  [99][ 80/940]  lr: 1.0000e-04  eta: 0:12:19  time: 0.4091  data_time: 0.0073  memory: 6021  grad_norm: 6.7405  loss: 3.7851  student.top1_acc: 0.5000  student.top5_acc: 0.8750  student.loss_cls: 1.1699  distill.loss_dist: 2.6152
2023/05/29 19:43:27 - mmengine - INFO - Epoch(train)  [99][100/940]  lr: 1.0000e-04  eta: 0:12:11  time: 0.4021  data_time: 0.0078  memory: 6021  grad_norm: 6.5184  loss: 3.4570  student.top1_acc: 0.6250  student.top5_acc: 0.6250  student.loss_cls: 1.1725  distill.loss_dist: 2.2845
2023/05/29 19:43:35 - mmengine - INFO - Epoch(train)  [99][120/940]  lr: 1.0000e-04  eta: 0:12:03  time: 0.4104  data_time: 0.0076  memory: 6021  grad_norm: 6.6915  loss: 3.3968  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.2046  distill.loss_dist: 2.1922
2023/05/29 19:43:43 - mmengine - INFO - Epoch(train)  [99][140/940]  lr: 1.0000e-04  eta: 0:11:54  time: 0.4072  data_time: 0.0076  memory: 6021  grad_norm: 6.7742  loss: 3.5145  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1881  distill.loss_dist: 2.3263
2023/05/29 19:43:51 - mmengine - INFO - Epoch(train)  [99][160/940]  lr: 1.0000e-04  eta: 0:11:46  time: 0.4077  data_time: 0.0073  memory: 6021  grad_norm: 6.7853  loss: 3.6396  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.9922  distill.loss_dist: 2.6474
2023/05/29 19:43:59 - mmengine - INFO - Epoch(train)  [99][180/940]  lr: 1.0000e-04  eta: 0:11:38  time: 0.4002  data_time: 0.0073  memory: 6021  grad_norm: 6.5806  loss: 3.9444  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 1.1127  distill.loss_dist: 2.8317
2023/05/29 19:44:07 - mmengine - INFO - Epoch(train)  [99][200/940]  lr: 1.0000e-04  eta: 0:11:29  time: 0.4001  data_time: 0.0075  memory: 6021  grad_norm: 6.7627  loss: 3.5107  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.0501  distill.loss_dist: 2.4606
2023/05/29 19:44:15 - mmengine - INFO - Epoch(train)  [99][220/940]  lr: 1.0000e-04  eta: 0:11:21  time: 0.4014  data_time: 0.0074  memory: 6021  grad_norm: 6.5804  loss: 3.7871  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.2497  distill.loss_dist: 2.5374
2023/05/29 19:44:23 - mmengine - INFO - Epoch(train)  [99][240/940]  lr: 1.0000e-04  eta: 0:11:13  time: 0.4018  data_time: 0.0072  memory: 6021  grad_norm: 6.5318  loss: 3.4126  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.8909  distill.loss_dist: 2.5218
2023/05/29 19:44:31 - mmengine - INFO - Epoch(train)  [99][260/940]  lr: 1.0000e-04  eta: 0:11:04  time: 0.4044  data_time: 0.0073  memory: 6021  grad_norm: 6.4535  loss: 3.5633  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1769  distill.loss_dist: 2.3864
2023/05/29 19:44:40 - mmengine - INFO - Epoch(train)  [99][280/940]  lr: 1.0000e-04  eta: 0:10:56  time: 0.4021  data_time: 0.0075  memory: 6021  grad_norm: 6.8424  loss: 3.6335  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.1811  distill.loss_dist: 2.4524
2023/05/29 19:44:48 - mmengine - INFO - Epoch(train)  [99][300/940]  lr: 1.0000e-04  eta: 0:10:48  time: 0.4003  data_time: 0.0073  memory: 6021  grad_norm: 6.6503  loss: 3.7403  student.top1_acc: 0.6250  student.top5_acc: 0.6250  student.loss_cls: 1.1847  distill.loss_dist: 2.5556
2023/05/29 19:44:56 - mmengine - INFO - Epoch(train)  [99][320/940]  lr: 1.0000e-04  eta: 0:10:40  time: 0.4090  data_time: 0.0072  memory: 6021  grad_norm: 6.5631  loss: 3.6877  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 0.9942  distill.loss_dist: 2.6935
2023/05/29 19:45:04 - mmengine - INFO - Epoch(train)  [99][340/940]  lr: 1.0000e-04  eta: 0:10:31  time: 0.3989  data_time: 0.0074  memory: 6021  grad_norm: 6.7353  loss: 3.5808  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.0909  distill.loss_dist: 2.4900
2023/05/29 19:45:12 - mmengine - INFO - Epoch(train)  [99][360/940]  lr: 1.0000e-04  eta: 0:10:23  time: 0.4094  data_time: 0.0073  memory: 6021  grad_norm: 6.6775  loss: 3.4691  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1082  distill.loss_dist: 2.3609
2023/05/29 19:45:20 - mmengine - INFO - Epoch(train)  [99][380/940]  lr: 1.0000e-04  eta: 0:10:15  time: 0.4026  data_time: 0.0076  memory: 6021  grad_norm: 6.5416  loss: 3.1985  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0278  distill.loss_dist: 2.1707
2023/05/29 19:45:28 - mmengine - INFO - Epoch(train)  [99][400/940]  lr: 1.0000e-04  eta: 0:10:06  time: 0.4072  data_time: 0.0075  memory: 6021  grad_norm: 6.6624  loss: 3.3202  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1027  distill.loss_dist: 2.2175
2023/05/29 19:45:36 - mmengine - INFO - Epoch(train)  [99][420/940]  lr: 1.0000e-04  eta: 0:09:58  time: 0.4021  data_time: 0.0073  memory: 6021  grad_norm: 6.5379  loss: 3.4516  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1848  distill.loss_dist: 2.2668
2023/05/29 19:45:44 - mmengine - INFO - Epoch(train)  [99][440/940]  lr: 1.0000e-04  eta: 0:09:50  time: 0.4007  data_time: 0.0073  memory: 6021  grad_norm: 6.7396  loss: 3.5248  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0534  distill.loss_dist: 2.4714
2023/05/29 19:45:52 - mmengine - INFO - Epoch(train)  [99][460/940]  lr: 1.0000e-04  eta: 0:09:41  time: 0.4031  data_time: 0.0074  memory: 6021  grad_norm: 6.5564  loss: 3.4996  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0923  distill.loss_dist: 2.4073
2023/05/29 19:46:00 - mmengine - INFO - Epoch(train)  [99][480/940]  lr: 1.0000e-04  eta: 0:09:33  time: 0.4129  data_time: 0.0077  memory: 6021  grad_norm: 6.5327  loss: 3.5497  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.1243  distill.loss_dist: 2.4254
2023/05/29 19:46:09 - mmengine - INFO - Epoch(train)  [99][500/940]  lr: 1.0000e-04  eta: 0:09:25  time: 0.4011  data_time: 0.0083  memory: 6021  grad_norm: 6.6420  loss: 3.5821  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9895  distill.loss_dist: 2.5926
2023/05/29 19:46:17 - mmengine - INFO - Epoch(train)  [99][520/940]  lr: 1.0000e-04  eta: 0:09:17  time: 0.3999  data_time: 0.0074  memory: 6021  grad_norm: 6.7157  loss: 3.7549  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 1.1374  distill.loss_dist: 2.6175
2023/05/29 19:46:25 - mmengine - INFO - Epoch(train)  [99][540/940]  lr: 1.0000e-04  eta: 0:09:08  time: 0.4024  data_time: 0.0075  memory: 6021  grad_norm: 6.8658  loss: 3.4891  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1123  distill.loss_dist: 2.3768
2023/05/29 19:46:33 - mmengine - INFO - Epoch(train)  [99][560/940]  lr: 1.0000e-04  eta: 0:09:00  time: 0.4045  data_time: 0.0080  memory: 6021  grad_norm: 6.7044  loss: 3.4934  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1509  distill.loss_dist: 2.3424
2023/05/29 19:46:41 - mmengine - INFO - Epoch(train)  [99][580/940]  lr: 1.0000e-04  eta: 0:08:52  time: 0.4092  data_time: 0.0077  memory: 6021  grad_norm: 6.7223  loss: 3.9800  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.2285  distill.loss_dist: 2.7515
2023/05/29 19:46:49 - mmengine - INFO - Epoch(train)  [99][600/940]  lr: 1.0000e-04  eta: 0:08:44  time: 0.4097  data_time: 0.0073  memory: 6021  grad_norm: 6.6838  loss: 3.2693  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9645  distill.loss_dist: 2.3047
2023/05/29 19:46:57 - mmengine - INFO - Epoch(train)  [99][620/940]  lr: 1.0000e-04  eta: 0:08:36  time: 0.3999  data_time: 0.0075  memory: 6021  grad_norm: 6.6219  loss: 3.4313  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.1388  distill.loss_dist: 2.2925
2023/05/29 19:47:05 - mmengine - INFO - Epoch(train)  [99][640/940]  lr: 1.0000e-04  eta: 0:08:27  time: 0.4089  data_time: 0.0076  memory: 6021  grad_norm: 6.5977  loss: 3.5154  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.1303  distill.loss_dist: 2.3851
2023/05/29 19:47:13 - mmengine - INFO - Epoch(train)  [99][660/940]  lr: 1.0000e-04  eta: 0:08:19  time: 0.4096  data_time: 0.0079  memory: 6021  grad_norm: 6.7454  loss: 3.6285  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0750  distill.loss_dist: 2.5535
2023/05/29 19:47:22 - mmengine - INFO - Epoch(train)  [99][680/940]  lr: 1.0000e-04  eta: 0:08:11  time: 0.4040  data_time: 0.0075  memory: 6021  grad_norm: 6.7226  loss: 3.6915  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.0397  distill.loss_dist: 2.6519
2023/05/29 19:47:30 - mmengine - INFO - Epoch(train)  [99][700/940]  lr: 1.0000e-04  eta: 0:08:03  time: 0.4013  data_time: 0.0075  memory: 6021  grad_norm: 6.6827  loss: 3.4237  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0933  distill.loss_dist: 2.3305
2023/05/29 19:47:38 - mmengine - INFO - Epoch(train)  [99][720/940]  lr: 1.0000e-04  eta: 0:07:54  time: 0.4012  data_time: 0.0078  memory: 6021  grad_norm: 6.5270  loss: 3.1495  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 0.9665  distill.loss_dist: 2.1830
2023/05/29 19:47:46 - mmengine - INFO - Epoch(train)  [99][740/940]  lr: 1.0000e-04  eta: 0:07:46  time: 0.4018  data_time: 0.0079  memory: 6021  grad_norm: 6.6732  loss: 3.5604  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1737  distill.loss_dist: 2.3867
2023/05/29 19:47:54 - mmengine - INFO - Epoch(train)  [99][760/940]  lr: 1.0000e-04  eta: 0:07:38  time: 0.4010  data_time: 0.0078  memory: 6021  grad_norm: 6.6320  loss: 3.9308  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.3403  distill.loss_dist: 2.5905
2023/05/29 19:48:02 - mmengine - INFO - Epoch(train)  [99][780/940]  lr: 1.0000e-04  eta: 0:07:30  time: 0.4065  data_time: 0.0075  memory: 6021  grad_norm: 6.6592  loss: 3.3717  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1647  distill.loss_dist: 2.2070
2023/05/29 19:48:10 - mmengine - INFO - Epoch(train)  [99][800/940]  lr: 1.0000e-04  eta: 0:07:21  time: 0.4010  data_time: 0.0076  memory: 6021  grad_norm: 6.6345  loss: 3.1227  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1099  distill.loss_dist: 2.0127
2023/05/29 19:48:18 - mmengine - INFO - Epoch(train)  [99][820/940]  lr: 1.0000e-04  eta: 0:07:13  time: 0.4031  data_time: 0.0075  memory: 6021  grad_norm: 6.6249  loss: 3.1052  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9652  distill.loss_dist: 2.1400
2023/05/29 19:48:26 - mmengine - INFO - Epoch(train)  [99][840/940]  lr: 1.0000e-04  eta: 0:07:05  time: 0.4102  data_time: 0.0078  memory: 6021  grad_norm: 6.8591  loss: 3.3259  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 0.9574  distill.loss_dist: 2.3686
2023/05/29 19:48:34 - mmengine - INFO - Epoch(train)  [99][860/940]  lr: 1.0000e-04  eta: 0:06:57  time: 0.4103  data_time: 0.0076  memory: 6021  grad_norm: 6.6490  loss: 3.3910  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9121  distill.loss_dist: 2.4789
2023/05/29 19:48:42 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:48:42 - mmengine - INFO - Epoch(train)  [99][880/940]  lr: 1.0000e-04  eta: 0:06:49  time: 0.3993  data_time: 0.0078  memory: 6021  grad_norm: 6.6950  loss: 3.3466  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9527  distill.loss_dist: 2.3939
2023/05/29 19:48:50 - mmengine - INFO - Epoch(train)  [99][900/940]  lr: 1.0000e-04  eta: 0:06:40  time: 0.4005  data_time: 0.0079  memory: 6021  grad_norm: 6.6353  loss: 3.4208  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.9294  distill.loss_dist: 2.4914
2023/05/29 19:48:58 - mmengine - INFO - Epoch(train)  [99][920/940]  lr: 1.0000e-04  eta: 0:06:32  time: 0.4014  data_time: 0.0080  memory: 6021  grad_norm: 6.8030  loss: 3.3738  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.9829  distill.loss_dist: 2.3909
2023/05/29 19:49:06 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:49:06 - mmengine - INFO - Epoch(train)  [99][940/940]  lr: 1.0000e-04  eta: 0:06:24  time: 0.3897  data_time: 0.0070  memory: 6021  grad_norm: 7.0168  loss: 3.8146  student.top1_acc: 0.5000  student.top5_acc: 0.5000  student.loss_cls: 1.3354  distill.loss_dist: 2.4792
2023/05/29 19:49:06 - mmengine - INFO - Saving checkpoint at 99 epochs
2023/05/29 19:49:15 - mmengine - INFO - Epoch(val)  [99][20/78]    eta: 0:00:18  time: 0.3242  data_time: 0.2718  memory: 1444  
2023/05/29 19:49:20 - mmengine - INFO - Epoch(val)  [99][40/78]    eta: 0:00:10  time: 0.2265  data_time: 0.1745  memory: 1444  
2023/05/29 19:49:25 - mmengine - INFO - Epoch(val)  [99][60/78]    eta: 0:00:04  time: 0.2702  data_time: 0.2195  memory: 1444  
2023/05/29 19:49:36 - mmengine - INFO - Epoch(val)  [99][20/78]    eta: 0:01:14  time: 0.3394  data_time: 0.1557  memory: 2227  
2023/05/29 19:49:41 - mmengine - INFO - Epoch(val)  [99][40/78]    eta: 0:00:28  time: 0.2379  data_time: 0.0538  memory: 2227  
2023/05/29 19:49:46 - mmengine - INFO - Epoch(val)  [99][60/78]    eta: 0:00:10  time: 0.2750  data_time: 0.0910  memory: 2227  
2023/05/29 19:49:52 - mmengine - INFO - Epoch(val) [99][78/78]    acc/top1: 0.7301  acc/top5: 0.9084  acc/mean1: 0.7301  teacher.acc/top1: 0.7727  teacher.acc/top5: 0.9298  teacher.acc/mean1: 0.7726  data_time: 0.0803  time: 0.2607
2023/05/29 19:50:02 - mmengine - INFO - Epoch(train) [100][ 20/940]  lr: 1.0000e-04  eta: 0:06:16  time: 0.5203  data_time: 0.0639  memory: 6021  grad_norm: 6.5640  loss: 3.2860  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.0043  distill.loss_dist: 2.2817
2023/05/29 19:50:10 - mmengine - INFO - Epoch(train) [100][ 40/940]  lr: 1.0000e-04  eta: 0:06:08  time: 0.4079  data_time: 0.0074  memory: 6021  grad_norm: 6.6722  loss: 3.6259  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0597  distill.loss_dist: 2.5661
2023/05/29 19:50:18 - mmengine - INFO - Epoch(train) [100][ 60/940]  lr: 1.0000e-04  eta: 0:06:00  time: 0.4015  data_time: 0.0078  memory: 6021  grad_norm: 6.7225  loss: 3.6314  student.top1_acc: 0.6250  student.top5_acc: 1.0000  student.loss_cls: 1.0868  distill.loss_dist: 2.5446
2023/05/29 19:50:26 - mmengine - INFO - Epoch(train) [100][ 80/940]  lr: 1.0000e-04  eta: 0:05:52  time: 0.4076  data_time: 0.0075  memory: 6021  grad_norm: 6.6391  loss: 3.3159  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9660  distill.loss_dist: 2.3500
2023/05/29 19:50:34 - mmengine - INFO - Epoch(train) [100][100/940]  lr: 1.0000e-04  eta: 0:05:43  time: 0.4011  data_time: 0.0076  memory: 6021  grad_norm: 6.5560  loss: 3.6439  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.1835  distill.loss_dist: 2.4604
2023/05/29 19:50:43 - mmengine - INFO - Epoch(train) [100][120/940]  lr: 1.0000e-04  eta: 0:05:35  time: 0.4017  data_time: 0.0077  memory: 6021  grad_norm: 6.6064  loss: 3.6195  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0812  distill.loss_dist: 2.5384
2023/05/29 19:50:51 - mmengine - INFO - Epoch(train) [100][140/940]  lr: 1.0000e-04  eta: 0:05:27  time: 0.4018  data_time: 0.0074  memory: 6021  grad_norm: 6.6118  loss: 3.2936  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9966  distill.loss_dist: 2.2970
2023/05/29 19:50:59 - mmengine - INFO - Epoch(train) [100][160/940]  lr: 1.0000e-04  eta: 0:05:19  time: 0.4093  data_time: 0.0078  memory: 6021  grad_norm: 6.6566  loss: 3.3778  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.0720  distill.loss_dist: 2.3057
2023/05/29 19:51:07 - mmengine - INFO - Epoch(train) [100][180/940]  lr: 1.0000e-04  eta: 0:05:11  time: 0.4016  data_time: 0.0076  memory: 6021  grad_norm: 6.6429  loss: 3.7370  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.3158  distill.loss_dist: 2.4212
2023/05/29 19:51:15 - mmengine - INFO - Epoch(train) [100][200/940]  lr: 1.0000e-04  eta: 0:05:02  time: 0.4086  data_time: 0.0082  memory: 6021  grad_norm: 6.6441  loss: 3.6872  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.3204  distill.loss_dist: 2.3668
2023/05/29 19:51:23 - mmengine - INFO - Epoch(train) [100][220/940]  lr: 1.0000e-04  eta: 0:04:54  time: 0.4059  data_time: 0.0080  memory: 6021  grad_norm: 6.8146  loss: 3.3988  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.0299  distill.loss_dist: 2.3690
2023/05/29 19:51:31 - mmengine - INFO - Epoch(train) [100][240/940]  lr: 1.0000e-04  eta: 0:04:46  time: 0.4087  data_time: 0.0074  memory: 6021  grad_norm: 6.7500  loss: 3.6211  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.2076  distill.loss_dist: 2.4135
2023/05/29 19:51:39 - mmengine - INFO - Epoch(train) [100][260/940]  lr: 1.0000e-04  eta: 0:04:38  time: 0.4098  data_time: 0.0080  memory: 6021  grad_norm: 6.6846  loss: 3.6462  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.1434  distill.loss_dist: 2.5028
2023/05/29 19:51:48 - mmengine - INFO - Epoch(train) [100][280/940]  lr: 1.0000e-04  eta: 0:04:30  time: 0.4085  data_time: 0.0076  memory: 6021  grad_norm: 6.7084  loss: 3.2385  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1064  distill.loss_dist: 2.1320
2023/05/29 19:51:56 - mmengine - INFO - Epoch(train) [100][300/940]  lr: 1.0000e-04  eta: 0:04:21  time: 0.4038  data_time: 0.0069  memory: 6021  grad_norm: 6.8262  loss: 3.5116  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.2196  distill.loss_dist: 2.2920
2023/05/29 19:52:04 - mmengine - INFO - Epoch(train) [100][320/940]  lr: 1.0000e-04  eta: 0:04:13  time: 0.4141  data_time: 0.0075  memory: 6021  grad_norm: 6.7540  loss: 3.3487  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0830  distill.loss_dist: 2.2657
2023/05/29 19:52:12 - mmengine - INFO - Epoch(train) [100][340/940]  lr: 1.0000e-04  eta: 0:04:05  time: 0.3999  data_time: 0.0079  memory: 6021  grad_norm: 6.7524  loss: 2.9074  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.7916  distill.loss_dist: 2.1158
2023/05/29 19:52:20 - mmengine - INFO - Epoch(train) [100][360/940]  lr: 1.0000e-04  eta: 0:03:57  time: 0.4013  data_time: 0.0078  memory: 6021  grad_norm: 6.8844  loss: 3.2534  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.8759  distill.loss_dist: 2.3775
2023/05/29 19:52:28 - mmengine - INFO - Epoch(train) [100][380/940]  lr: 1.0000e-04  eta: 0:03:49  time: 0.4003  data_time: 0.0076  memory: 6021  grad_norm: 6.7551  loss: 3.3924  student.top1_acc: 0.7500  student.top5_acc: 0.7500  student.loss_cls: 1.2244  distill.loss_dist: 2.1680
2023/05/29 19:52:36 - mmengine - INFO - Epoch(train) [100][400/940]  lr: 1.0000e-04  eta: 0:03:40  time: 0.4009  data_time: 0.0079  memory: 6021  grad_norm: 6.7060  loss: 3.5687  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0190  distill.loss_dist: 2.5497
2023/05/29 19:52:44 - mmengine - INFO - Epoch(train) [100][420/940]  lr: 1.0000e-04  eta: 0:03:32  time: 0.4025  data_time: 0.0080  memory: 6021  grad_norm: 6.5740  loss: 3.5382  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 1.0017  distill.loss_dist: 2.5364
2023/05/29 19:52:52 - mmengine - INFO - Epoch(train) [100][440/940]  lr: 1.0000e-04  eta: 0:03:24  time: 0.4072  data_time: 0.0076  memory: 6021  grad_norm: 6.5829  loss: 3.4158  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.1492  distill.loss_dist: 2.2666
2023/05/29 19:53:00 - mmengine - INFO - Epoch(train) [100][460/940]  lr: 1.0000e-04  eta: 0:03:16  time: 0.4003  data_time: 0.0079  memory: 6021  grad_norm: 6.8636  loss: 3.1364  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.8564  distill.loss_dist: 2.2800
2023/05/29 19:53:08 - mmengine - INFO - Epoch(train) [100][480/940]  lr: 1.0000e-04  eta: 0:03:08  time: 0.4084  data_time: 0.0073  memory: 6021  grad_norm: 6.8285  loss: 3.3883  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.0149  distill.loss_dist: 2.3734
2023/05/29 19:53:17 - mmengine - INFO - Epoch(train) [100][500/940]  lr: 1.0000e-04  eta: 0:02:59  time: 0.4068  data_time: 0.0074  memory: 6021  grad_norm: 6.7985  loss: 3.6195  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.1324  distill.loss_dist: 2.4871
2023/05/29 19:53:25 - mmengine - INFO - Epoch(train) [100][520/940]  lr: 1.0000e-04  eta: 0:02:51  time: 0.4002  data_time: 0.0074  memory: 6021  grad_norm: 6.9088  loss: 3.7382  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1994  distill.loss_dist: 2.5388
2023/05/29 19:53:33 - mmengine - INFO - Epoch(train) [100][540/940]  lr: 1.0000e-04  eta: 0:02:43  time: 0.4015  data_time: 0.0077  memory: 6021  grad_norm: 6.8663  loss: 3.4412  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.2751  distill.loss_dist: 2.1661
2023/05/29 19:53:41 - mmengine - INFO - Epoch(train) [100][560/940]  lr: 1.0000e-04  eta: 0:02:35  time: 0.4124  data_time: 0.0074  memory: 6021  grad_norm: 6.8081  loss: 3.7097  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.0379  distill.loss_dist: 2.6718
2023/05/29 19:53:49 - mmengine - INFO - Epoch(train) [100][580/940]  lr: 1.0000e-04  eta: 0:02:27  time: 0.4102  data_time: 0.0078  memory: 6021  grad_norm: 6.6122  loss: 3.3554  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.9057  distill.loss_dist: 2.4497
2023/05/29 19:53:57 - mmengine - INFO - Epoch(train) [100][600/940]  lr: 1.0000e-04  eta: 0:02:18  time: 0.4004  data_time: 0.0075  memory: 6021  grad_norm: 6.5408  loss: 3.8258  student.top1_acc: 0.6250  student.top5_acc: 0.7500  student.loss_cls: 1.1851  distill.loss_dist: 2.6407
2023/05/29 19:54:05 - mmengine - INFO - Epoch(train) [100][620/940]  lr: 1.0000e-04  eta: 0:02:10  time: 0.4024  data_time: 0.0073  memory: 6021  grad_norm: 6.6627  loss: 3.6948  student.top1_acc: 0.5000  student.top5_acc: 0.7500  student.loss_cls: 1.0571  distill.loss_dist: 2.6377
2023/05/29 19:54:13 - mmengine - INFO - Epoch(train) [100][640/940]  lr: 1.0000e-04  eta: 0:02:02  time: 0.4020  data_time: 0.0076  memory: 6021  grad_norm: 6.5909  loss: 3.5604  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 1.1832  distill.loss_dist: 2.3772
2023/05/29 19:54:21 - mmengine - INFO - Epoch(train) [100][660/940]  lr: 1.0000e-04  eta: 0:01:54  time: 0.4020  data_time: 0.0079  memory: 6021  grad_norm: 6.7040  loss: 3.0740  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 0.9197  distill.loss_dist: 2.1543
2023/05/29 19:54:29 - mmengine - INFO - Epoch(train) [100][680/940]  lr: 1.0000e-04  eta: 0:01:46  time: 0.4012  data_time: 0.0075  memory: 6021  grad_norm: 6.7818  loss: 3.2584  student.top1_acc: 1.0000  student.top5_acc: 1.0000  student.loss_cls: 0.9521  distill.loss_dist: 2.3063
2023/05/29 19:54:37 - mmengine - INFO - Epoch(train) [100][700/940]  lr: 1.0000e-04  eta: 0:01:38  time: 0.4005  data_time: 0.0077  memory: 6021  grad_norm: 6.6493  loss: 3.3947  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 1.0773  distill.loss_dist: 2.3174
2023/05/29 19:54:45 - mmengine - INFO - Epoch(train) [100][720/940]  lr: 1.0000e-04  eta: 0:01:29  time: 0.4012  data_time: 0.0075  memory: 6021  grad_norm: 6.7105  loss: 3.3163  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1346  distill.loss_dist: 2.1817
2023/05/29 19:54:53 - mmengine - INFO - Epoch(train) [100][740/940]  lr: 1.0000e-04  eta: 0:01:21  time: 0.4015  data_time: 0.0076  memory: 6021  grad_norm: 6.6816  loss: 3.2958  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9234  distill.loss_dist: 2.3724
2023/05/29 19:55:02 - mmengine - INFO - Epoch(train) [100][760/940]  lr: 1.0000e-04  eta: 0:01:13  time: 0.4091  data_time: 0.0075  memory: 6021  grad_norm: 6.6640  loss: 3.6503  student.top1_acc: 0.6250  student.top5_acc: 0.8750  student.loss_cls: 1.1211  distill.loss_dist: 2.5292
2023/05/29 19:55:10 - mmengine - INFO - Epoch(train) [100][780/940]  lr: 1.0000e-04  eta: 0:01:05  time: 0.4086  data_time: 0.0074  memory: 6021  grad_norm: 6.5427  loss: 3.4866  student.top1_acc: 0.8750  student.top5_acc: 0.8750  student.loss_cls: 1.1649  distill.loss_dist: 2.3217
2023/05/29 19:55:18 - mmengine - INFO - Epoch(train) [100][800/940]  lr: 1.0000e-04  eta: 0:00:57  time: 0.4093  data_time: 0.0076  memory: 6021  grad_norm: 6.9517  loss: 3.3781  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9562  distill.loss_dist: 2.4218
2023/05/29 19:55:26 - mmengine - INFO - Epoch(train) [100][820/940]  lr: 1.0000e-04  eta: 0:00:49  time: 0.4099  data_time: 0.0078  memory: 6021  grad_norm: 6.7308  loss: 3.5321  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 0.9932  distill.loss_dist: 2.5389
2023/05/29 19:55:34 - mmengine - INFO - Epoch(train) [100][840/940]  lr: 1.0000e-04  eta: 0:00:40  time: 0.4014  data_time: 0.0076  memory: 6021  grad_norm: 6.5742  loss: 3.6169  student.top1_acc: 0.8750  student.top5_acc: 1.0000  student.loss_cls: 0.9482  distill.loss_dist: 2.6687
2023/05/29 19:55:42 - mmengine - INFO - Epoch(train) [100][860/940]  lr: 1.0000e-04  eta: 0:00:32  time: 0.4094  data_time: 0.0078  memory: 6021  grad_norm: 6.5108  loss: 3.3429  student.top1_acc: 0.3750  student.top5_acc: 0.7500  student.loss_cls: 1.0236  distill.loss_dist: 2.3193
2023/05/29 19:55:50 - mmengine - INFO - Epoch(train) [100][880/940]  lr: 1.0000e-04  eta: 0:00:24  time: 0.4063  data_time: 0.0075  memory: 6021  grad_norm: 6.8509  loss: 3.6765  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1648  distill.loss_dist: 2.5117
2023/05/29 19:55:58 - mmengine - INFO - Epoch(train) [100][900/940]  lr: 1.0000e-04  eta: 0:00:16  time: 0.3998  data_time: 0.0072  memory: 6021  grad_norm: 6.6916  loss: 3.5516  student.top1_acc: 0.7500  student.top5_acc: 0.8750  student.loss_cls: 1.1698  distill.loss_dist: 2.3818
2023/05/29 19:56:07 - mmengine - INFO - Epoch(train) [100][920/940]  lr: 1.0000e-04  eta: 0:00:08  time: 0.4091  data_time: 0.0075  memory: 6021  grad_norm: 6.7991  loss: 3.3925  student.top1_acc: 0.7500  student.top5_acc: 1.0000  student.loss_cls: 0.8621  distill.loss_dist: 2.5304
2023/05/29 19:56:14 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809
2023/05/29 19:56:14 - mmengine - INFO - Epoch(train) [100][940/940]  lr: 1.0000e-04  eta: 0:00:00  time: 0.3814  data_time: 0.0075  memory: 6021  grad_norm: 6.7489  loss: 3.2966  student.top1_acc: 0.5000  student.top5_acc: 1.0000  student.loss_cls: 1.0397  distill.loss_dist: 2.2569
2023/05/29 19:56:14 - mmengine - INFO - Saving checkpoint at 100 epochs
2023/05/29 19:56:24 - mmengine - INFO - Epoch(val) [100][20/78]    eta: 0:00:18  time: 0.3142  data_time: 0.2625  memory: 1444  
2023/05/29 19:56:28 - mmengine - INFO - Epoch(val) [100][40/78]    eta: 0:00:10  time: 0.2219  data_time: 0.1709  memory: 1444  
2023/05/29 19:56:33 - mmengine - INFO - Epoch(val) [100][60/78]    eta: 0:00:04  time: 0.2496  data_time: 0.1995  memory: 1444  
2023/05/29 19:56:44 - mmengine - INFO - Epoch(val) [100][20/78]    eta: 0:01:12  time: 0.3370  data_time: 0.1538  memory: 2227  
2023/05/29 19:56:49 - mmengine - INFO - Epoch(val) [100][40/78]    eta: 0:00:28  time: 0.2531  data_time: 0.0698  memory: 2227  
2023/05/29 19:56:54 - mmengine - INFO - Epoch(val) [100][60/78]    eta: 0:00:10  time: 0.2678  data_time: 0.0846  memory: 2227  
2023/05/29 19:56:59 - mmengine - INFO - Epoch(val) [100][78/78]    acc/top1: 0.7311  acc/top5: 0.9088  acc/mean1: 0.7310  teacher.acc/top1: 0.7727  teacher.acc/top5: 0.9298  teacher.acc/mean1: 0.7726  data_time: 0.0812  time: 0.2610
2023/05/29 19:56:59 - mmengine - INFO - The previous best checkpoint /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/best_acc_top1_epoch_84.pth is removed
2023/05/29 19:57:01 - mmengine - INFO - The best checkpoint with 0.7311 acc/top1 at 100 epoch is saved to best_acc_top1_epoch_100.pth.