2023/05/29 19:28:13 - mmengine - INFO - ------------------------------------------------------------ System environment: sys.platform: linux Python: 3.9.0 (default, Nov 15 2020, 14:28:56) [GCC 7.3.0] CUDA available: True numpy_random_seed: 23917156 GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.3, V11.3.109 GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 PyTorch: 1.12.1+cu113 PyTorch compiling details: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX2 - CUDA Runtime 11.3 - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86 - CuDNN 8.2 - Built with CuDNN 8.3.2 - Magma 2.5.2 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, TorchVision: 0.13.1+cu102 OpenCV: 4.7.0 MMEngine: 0.7.3 Runtime environment: cudnn_benchmark: False mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0} dist_cfg: {'backend': 'nccl'} seed: None diff_rank_seed: False deterministic: False Distributed launcher: pytorch Distributed training: True GPU number: 32 ------------------------------------------------------------ 2023/05/29 19:28:14 - mmengine - INFO - Config: model = dict( _scope_='mmrazor', type='SingleTeacherDistill', architecture=dict( cfg_path= 'mmaction::recognition/tsn/tsn_imagenet-pretrained-r50_8xb32-1x1x8-100e_kinetics400-rgb.py', backbone=dict(pretrained=False), pretrained=False), teacher=dict( cfg_path= 'mmaction::recognition/tsn/custom_backbones/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb.py', pretrained=False), teacher_ckpt= 'work_dirs/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb/best_acc_top1_epoch_47.pth', distiller=dict( type='ConfigurableDistiller', student_recorders=dict( logits=dict(type='ModuleOutputs', source='cls_head.fc_cls')), teacher_recorders=dict( logits=dict(type='ModuleOutputs', source='cls_head.fc_cls')), distill_losses=dict( loss_dist=dict( type='DISTLoss', inter_loss_weight=1.0, intra_loss_weight=1.0, tau=1, loss_weight=4)), loss_forward_mappings=dict( loss_dist=dict( logits_S=dict(from_student=True, recorder='logits'), logits_T=dict(from_student=False, recorder='logits'))))) train_cfg = dict( type='EpochBasedTrainLoop', max_epochs=100, val_begin=1, val_interval=1, _scope_='mmaction') val_cfg = dict(type='mmrazor.SingleTeacherDistillValLoop') test_cfg = dict(type='TestLoop', _scope_='mmaction') param_scheduler = [ dict( type='MultiStepLR', begin=0, end=100, by_epoch=True, milestones=[40, 80], gamma=0.1, _scope_='mmaction') ] optim_wrapper = dict( optimizer=dict( type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001, _scope_='mmaction'), clip_grad=dict(max_norm=40, norm_type=2)) default_scope = 'mmaction' default_hooks = dict( runtime_info=dict(type='RuntimeInfoHook', _scope_='mmaction'), timer=dict(type='IterTimerHook', _scope_='mmaction'), logger=dict( type='LoggerHook', interval=20, ignore_last=False, _scope_='mmaction'), param_scheduler=dict(type='ParamSchedulerHook', _scope_='mmaction'), checkpoint=dict( type='CheckpointHook', interval=3, save_best='auto', max_keep_ckpts=3, _scope_='mmaction'), sampler_seed=dict(type='DistSamplerSeedHook', _scope_='mmaction'), sync_buffers=dict(type='SyncBuffersHook', _scope_='mmaction')) env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl')) log_processor = dict( type='LogProcessor', window_size=20, by_epoch=True, _scope_='mmaction') vis_backends = [dict(type='LocalVisBackend', _scope_='mmaction')] visualizer = dict( type='ActionVisualizer', vis_backends=[dict(type='LocalVisBackend')], _scope_='mmaction') log_level = 'INFO' load_from = None resume = True dataset_type = 'VideoDataset' data_root = 'data/kinetics400/videos_train' data_root_val = 'data/kinetics400/videos_val' ann_file_train = 'data/kinetics400/kinetics400_train_list_videos.txt' ann_file_val = 'data/kinetics400/kinetics400_val_list_videos.txt' file_client_args = dict(io_backend='disk') train_pipeline = [ dict(type='DecordInit', io_backend='disk', _scope_='mmaction'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8, _scope_='mmaction'), dict(type='DecordDecode', _scope_='mmaction'), dict(type='Resize', scale=(-1, 256), _scope_='mmaction'), dict( type='MultiScaleCrop', input_size=224, scales=(1, 0.875, 0.75, 0.66), random_crop=False, max_wh_scale_gap=1, _scope_='mmaction'), dict( type='Resize', scale=(224, 224), keep_ratio=False, _scope_='mmaction'), dict(type='Flip', flip_ratio=0.5, _scope_='mmaction'), dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'), dict(type='PackActionInputs', _scope_='mmaction') ] val_pipeline = [ dict(type='DecordInit', io_backend='disk', _scope_='mmaction'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8, test_mode=True, _scope_='mmaction'), dict(type='DecordDecode', _scope_='mmaction'), dict(type='Resize', scale=(-1, 256), _scope_='mmaction'), dict(type='CenterCrop', crop_size=224, _scope_='mmaction'), dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'), dict(type='PackActionInputs', _scope_='mmaction') ] test_pipeline = [ dict(type='DecordInit', io_backend='disk', _scope_='mmaction'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=25, test_mode=True, _scope_='mmaction'), dict(type='DecordDecode', _scope_='mmaction'), dict(type='Resize', scale=(-1, 256), _scope_='mmaction'), dict(type='TenCrop', crop_size=224, _scope_='mmaction'), dict(type='FormatShape', input_format='NCHW', _scope_='mmaction'), dict(type='PackActionInputs', _scope_='mmaction') ] train_dataloader = dict( batch_size=8, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True, _scope_='mmaction'), dataset=dict( type='VideoDataset', ann_file='data/kinetics400/kinetics400_train_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_train'), pipeline=[ dict(type='DecordInit', io_backend='disk'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8), dict(type='DecordDecode'), dict(type='Resize', scale=(-1, 256)), dict( type='MultiScaleCrop', input_size=224, scales=(1, 0.875, 0.75, 0.66), random_crop=False, max_wh_scale_gap=1), dict(type='Resize', scale=(224, 224), keep_ratio=False), dict(type='Flip', flip_ratio=0.5), dict(type='FormatShape', input_format='NCHW'), dict(type='PackActionInputs') ], _scope_='mmaction')) val_dataloader = dict( batch_size=8, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=False, _scope_='mmaction'), dataset=dict( type='VideoDataset', ann_file='data/kinetics400/kinetics400_val_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_val'), pipeline=[ dict(type='DecordInit', io_backend='disk'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=8, test_mode=True), dict(type='DecordDecode'), dict(type='Resize', scale=(-1, 256)), dict(type='CenterCrop', crop_size=224), dict(type='FormatShape', input_format='NCHW'), dict(type='PackActionInputs') ], test_mode=True, _scope_='mmaction')) test_dataloader = dict( batch_size=1, num_workers=8, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=False, _scope_='mmaction'), dataset=dict( type='VideoDataset', ann_file='data/kinetics400/kinetics400_val_list_videos.txt', data_prefix=dict(video='data/kinetics400/videos_val'), pipeline=[ dict(type='DecordInit', io_backend='disk'), dict( type='SampleFrames', clip_len=1, frame_interval=1, num_clips=25, test_mode=True), dict(type='DecordDecode'), dict(type='Resize', scale=(-1, 256)), dict(type='TenCrop', crop_size=224), dict(type='FormatShape', input_format='NCHW'), dict(type='PackActionInputs') ], test_mode=True, _scope_='mmaction')) val_evaluator = dict(type='AccMetric', _scope_='mmaction') test_evaluator = dict(type='AccMetric', _scope_='mmaction') auto_scale_lr = dict(enable=True, base_batch_size=256) teacher_ckpt = 'work_dirs/tsn_imagenet-pretrained-swin-transformer_32xb8-1x1x8-50e_kinetics400-rgb/best_acc_top1_epoch_47.pth' launcher = 'pytorch' work_dir = './work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4' randomness = dict(seed=None, diff_rank_seed=False, deterministic=False) 2023/05/29 19:28:22 - mmengine - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) RuntimeInfoHook (BELOW_NORMAL) LoggerHook -------------------- before_train: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (VERY_LOW ) CheckpointHook -------------------- before_train_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) DistSamplerSeedHook -------------------- before_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook -------------------- after_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook -------------------- after_train_epoch: (NORMAL ) IterTimerHook (NORMAL ) SyncBuffersHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook -------------------- before_val_epoch: (NORMAL ) IterTimerHook (NORMAL ) SyncBuffersHook -------------------- before_val_iter: (NORMAL ) IterTimerHook -------------------- after_val_iter: (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook -------------------- after_val_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook -------------------- after_train: (VERY_LOW ) CheckpointHook -------------------- before_test_epoch: (NORMAL ) IterTimerHook -------------------- before_test_iter: (NORMAL ) IterTimerHook -------------------- after_test_iter: (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook -------------------- after_test_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook -------------------- after_run: (BELOW_NORMAL) LoggerHook -------------------- 2023/05/29 19:28:23 - mmengine - INFO - LR is set based on batch size of 256 and the current batch size is 256. Scaling the original LR by 1.0. 2023/05/29 19:28:24 - mmengine - INFO - These parameters in pretrained checkpoint are not loaded: {'fc.weight', 'fc.bias'} 2023/05/29 19:28:24 - mmengine - WARNING - init_weights of Recognizer2D has been called more than once. Name of parameter - Initialization information architecture.backbone.conv1.conv.weight - torch.Size([64, 3, 7, 7]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.conv1.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.conv1.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv1.conv.weight - torch.Size([64, 64, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv1.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv1.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv2.conv.weight - torch.Size([64, 64, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv2.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv2.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv3.conv.weight - torch.Size([256, 64, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv3.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.conv3.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.downsample.conv.weight - torch.Size([256, 64, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.downsample.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.0.downsample.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv1.conv.weight - torch.Size([64, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv1.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv1.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv2.conv.weight - torch.Size([64, 64, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv2.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv2.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv3.conv.weight - torch.Size([256, 64, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv3.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.1.conv3.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv1.conv.weight - torch.Size([64, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv1.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv1.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv2.conv.weight - torch.Size([64, 64, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv2.bn.weight - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv2.bn.bias - torch.Size([64]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv3.conv.weight - torch.Size([256, 64, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv3.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer1.2.conv3.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv1.conv.weight - torch.Size([128, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv1.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv1.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv2.conv.weight - torch.Size([128, 128, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv2.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv2.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv3.conv.weight - torch.Size([512, 128, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv3.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.conv3.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.downsample.conv.weight - torch.Size([512, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.downsample.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.0.downsample.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv1.conv.weight - torch.Size([128, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv1.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv1.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv2.conv.weight - torch.Size([128, 128, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv2.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv2.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv3.conv.weight - torch.Size([512, 128, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv3.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.1.conv3.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv1.conv.weight - torch.Size([128, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv1.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv1.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv2.conv.weight - torch.Size([128, 128, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv2.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv2.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv3.conv.weight - torch.Size([512, 128, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv3.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.2.conv3.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv1.conv.weight - torch.Size([128, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv1.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv1.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv2.conv.weight - torch.Size([128, 128, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv2.bn.weight - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv2.bn.bias - torch.Size([128]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv3.conv.weight - torch.Size([512, 128, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv3.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer2.3.conv3.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv1.conv.weight - torch.Size([256, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.downsample.conv.weight - torch.Size([1024, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.downsample.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.0.downsample.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.1.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.2.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.3.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.4.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv1.conv.weight - torch.Size([256, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv1.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv1.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv2.conv.weight - torch.Size([256, 256, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv2.bn.weight - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv2.bn.bias - torch.Size([256]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv3.conv.weight - torch.Size([1024, 256, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv3.bn.weight - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer3.5.conv3.bn.bias - torch.Size([1024]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv1.conv.weight - torch.Size([512, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv1.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv1.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv2.conv.weight - torch.Size([512, 512, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv2.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv2.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv3.bn.weight - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.conv3.bn.bias - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.downsample.conv.weight - torch.Size([2048, 1024, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.downsample.bn.weight - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.0.downsample.bn.bias - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv1.conv.weight - torch.Size([512, 2048, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv1.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv1.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv2.conv.weight - torch.Size([512, 512, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv2.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv2.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv3.bn.weight - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.1.conv3.bn.bias - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv1.conv.weight - torch.Size([512, 2048, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv1.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv1.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv2.conv.weight - torch.Size([512, 512, 3, 3]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv2.bn.weight - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv2.bn.bias - torch.Size([512]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv3.conv.weight - torch.Size([2048, 512, 1, 1]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv3.bn.weight - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.backbone.layer4.2.conv3.bn.bias - torch.Size([2048]): Initialized by user-defined `init_weights` in ResNet architecture.cls_head.fc_cls.weight - torch.Size([400, 2048]): Initialized by user-defined `init_weights` in TSNHead architecture.cls_head.fc_cls.bias - torch.Size([400]): Initialized by user-defined `init_weights` in TSNHead teacher.backbone.patch_embed.proj.weight - torch.Size([128, 3, 4, 4]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.patch_embed.proj.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.patch_embed.norm.weight - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.patch_embed.norm.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.norm1.weight - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.norm1.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.attn.relative_position_bias_table - torch.Size([169, 4]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.attn.qkv.weight - torch.Size([384, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.attn.qkv.bias - torch.Size([384]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.attn.proj.weight - torch.Size([128, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.attn.proj.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.norm2.weight - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.norm2.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.mlp.fc1.weight - torch.Size([512, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.mlp.fc1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.mlp.fc2.weight - torch.Size([128, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.0.mlp.fc2.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.norm1.weight - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.norm1.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.attn.relative_position_bias_table - torch.Size([169, 4]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.attn.qkv.weight - torch.Size([384, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.attn.qkv.bias - torch.Size([384]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.attn.proj.weight - torch.Size([128, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.attn.proj.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.norm2.weight - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.norm2.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.mlp.fc1.weight - torch.Size([512, 128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.mlp.fc1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.mlp.fc2.weight - torch.Size([128, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.blocks.1.mlp.fc2.bias - torch.Size([128]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.downsample.norm.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.downsample.norm.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.0.downsample.reduction.weight - torch.Size([256, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.norm1.weight - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.norm1.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.attn.relative_position_bias_table - torch.Size([169, 8]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.attn.qkv.weight - torch.Size([768, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.attn.qkv.bias - torch.Size([768]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.attn.proj.weight - torch.Size([256, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.attn.proj.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.norm2.weight - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.norm2.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.mlp.fc1.weight - torch.Size([1024, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.mlp.fc1.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.mlp.fc2.weight - torch.Size([256, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.0.mlp.fc2.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.norm1.weight - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.norm1.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.attn.relative_position_bias_table - torch.Size([169, 8]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.attn.qkv.weight - torch.Size([768, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.attn.qkv.bias - torch.Size([768]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.attn.proj.weight - torch.Size([256, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.attn.proj.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.norm2.weight - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.norm2.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.mlp.fc1.weight - torch.Size([1024, 256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.mlp.fc1.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.mlp.fc2.weight - torch.Size([256, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.blocks.1.mlp.fc2.bias - torch.Size([256]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.downsample.norm.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.downsample.norm.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.1.downsample.reduction.weight - torch.Size([512, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.0.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.1.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.2.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.3.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.4.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.5.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.6.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.7.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.8.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.9.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.10.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.11.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.12.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.13.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.14.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.15.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.16.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.norm1.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.norm1.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.attn.relative_position_bias_table - torch.Size([169, 16]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.attn.qkv.weight - torch.Size([1536, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.attn.qkv.bias - torch.Size([1536]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.attn.proj.weight - torch.Size([512, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.attn.proj.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.norm2.weight - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.norm2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.mlp.fc1.weight - torch.Size([2048, 512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.mlp.fc1.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.mlp.fc2.weight - torch.Size([512, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.blocks.17.mlp.fc2.bias - torch.Size([512]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.downsample.norm.weight - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.downsample.norm.bias - torch.Size([2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.2.downsample.reduction.weight - torch.Size([1024, 2048]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.norm1.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.norm1.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.attn.relative_position_bias_table - torch.Size([169, 32]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.attn.qkv.weight - torch.Size([3072, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.attn.qkv.bias - torch.Size([3072]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.attn.proj.weight - torch.Size([1024, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.attn.proj.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.norm2.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.norm2.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.mlp.fc1.weight - torch.Size([4096, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.mlp.fc1.bias - torch.Size([4096]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.mlp.fc2.weight - torch.Size([1024, 4096]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.0.mlp.fc2.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.norm1.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.norm1.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.attn.relative_position_bias_table - torch.Size([169, 32]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.attn.qkv.weight - torch.Size([3072, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.attn.qkv.bias - torch.Size([3072]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.attn.proj.weight - torch.Size([1024, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.attn.proj.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.norm2.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.norm2.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.mlp.fc1.weight - torch.Size([4096, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.mlp.fc1.bias - torch.Size([4096]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.mlp.fc2.weight - torch.Size([1024, 4096]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.layers.3.blocks.1.mlp.fc2.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.norm.weight - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.backbone.norm.bias - torch.Size([1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.cls_head.fc_cls.weight - torch.Size([400, 1024]): The value is the same before and after calling `init_weights` of SingleTeacherDistill teacher.cls_head.fc_cls.bias - torch.Size([400]): The value is the same before and after calling `init_weights` of SingleTeacherDistill 2023/05/29 19:28:24 - mmengine - INFO - Auto resumed from the latest checkpoint /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/epoch_96.pth. 2023/05/29 19:28:24 - mmengine - INFO - Load checkpoint from /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/epoch_96.pth 2023/05/29 19:28:24 - mmengine - INFO - resumed epoch: 96, iter: 90240 2023/05/29 19:28:24 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io 2023/05/29 19:28:24 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future. 2023/05/29 19:28:24 - mmengine - INFO - Checkpoints will be saved to /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4. 2023/05/29 19:28:41 - mmengine - INFO - Epoch(train) [97][ 20/940] lr: 1.0000e-04 eta: 0:51:18 time: 0.8230 data_time: 0.1149 memory: 6021 grad_norm: 6.5763 loss: 3.6471 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1103 distill.loss_dist: 2.5368 2023/05/29 19:28:49 - mmengine - INFO - Epoch(train) [97][ 40/940] lr: 1.0000e-04 eta: 0:38:09 time: 0.4079 data_time: 0.0076 memory: 6021 grad_norm: 6.6816 loss: 3.3565 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.0646 distill.loss_dist: 2.2918 2023/05/29 19:28:57 - mmengine - INFO - Epoch(train) [97][ 60/940] lr: 1.0000e-04 eta: 0:33:40 time: 0.4070 data_time: 0.0075 memory: 6021 grad_norm: 6.7736 loss: 3.6660 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.0820 distill.loss_dist: 2.5840 2023/05/29 19:29:05 - mmengine - INFO - Epoch(train) [97][ 80/940] lr: 1.0000e-04 eta: 0:31:13 time: 0.3986 data_time: 0.0072 memory: 6021 grad_norm: 6.6814 loss: 3.4649 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1160 distill.loss_dist: 2.3488 2023/05/29 19:29:13 - mmengine - INFO - Epoch(train) [97][100/940] lr: 1.0000e-04 eta: 0:29:48 time: 0.4067 data_time: 0.0073 memory: 6021 grad_norm: 6.7277 loss: 3.3342 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.7542 distill.loss_dist: 2.5800 2023/05/29 19:29:22 - mmengine - INFO - Epoch(train) [97][120/940] lr: 1.0000e-04 eta: 0:28:48 time: 0.4066 data_time: 0.0074 memory: 6021 grad_norm: 6.5996 loss: 3.3161 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 1.1498 distill.loss_dist: 2.1664 2023/05/29 19:29:30 - mmengine - INFO - Epoch(train) [97][140/940] lr: 1.0000e-04 eta: 0:28:03 time: 0.4058 data_time: 0.0074 memory: 6021 grad_norm: 6.5602 loss: 3.5661 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0614 distill.loss_dist: 2.5047 2023/05/29 19:29:38 - mmengine - INFO - Epoch(train) [97][160/940] lr: 1.0000e-04 eta: 0:27:25 time: 0.4007 data_time: 0.0077 memory: 6021 grad_norm: 6.8317 loss: 3.4784 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0921 distill.loss_dist: 2.3862 2023/05/29 19:29:46 - mmengine - INFO - Epoch(train) [97][180/940] lr: 1.0000e-04 eta: 0:26:56 time: 0.4083 data_time: 0.0075 memory: 6021 grad_norm: 6.7810 loss: 3.3169 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0456 distill.loss_dist: 2.2714 2023/05/29 19:29:54 - mmengine - INFO - Epoch(train) [97][200/940] lr: 1.0000e-04 eta: 0:26:32 time: 0.4081 data_time: 0.0074 memory: 6021 grad_norm: 6.7650 loss: 3.3738 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0281 distill.loss_dist: 2.3457 2023/05/29 19:30:02 - mmengine - INFO - Epoch(train) [97][220/940] lr: 1.0000e-04 eta: 0:26:08 time: 0.4002 data_time: 0.0075 memory: 6021 grad_norm: 6.6280 loss: 3.6571 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.1640 distill.loss_dist: 2.4931 2023/05/29 19:30:10 - mmengine - INFO - Epoch(train) [97][240/940] lr: 1.0000e-04 eta: 0:25:49 time: 0.4085 data_time: 0.0072 memory: 6021 grad_norm: 6.7731 loss: 3.0371 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 0.7885 distill.loss_dist: 2.2486 2023/05/29 19:30:18 - mmengine - INFO - Epoch(train) [97][260/940] lr: 1.0000e-04 eta: 0:25:30 time: 0.4021 data_time: 0.0076 memory: 6021 grad_norm: 6.8296 loss: 3.2197 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.8845 distill.loss_dist: 2.3352 2023/05/29 19:30:26 - mmengine - INFO - Epoch(train) [97][280/940] lr: 1.0000e-04 eta: 0:25:12 time: 0.4016 data_time: 0.0080 memory: 6021 grad_norm: 6.5911 loss: 3.2333 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 0.9141 distill.loss_dist: 2.3193 2023/05/29 19:30:34 - mmengine - INFO - Epoch(train) [97][300/940] lr: 1.0000e-04 eta: 0:24:58 time: 0.4106 data_time: 0.0072 memory: 6021 grad_norm: 6.9362 loss: 3.5899 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 1.0922 distill.loss_dist: 2.4978 2023/05/29 19:30:43 - mmengine - INFO - Epoch(train) [97][320/940] lr: 1.0000e-04 eta: 0:24:43 time: 0.4022 data_time: 0.0072 memory: 6021 grad_norm: 6.5766 loss: 3.3917 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 0.8606 distill.loss_dist: 2.5310 2023/05/29 19:30:51 - mmengine - INFO - Epoch(train) [97][340/940] lr: 1.0000e-04 eta: 0:24:30 time: 0.4115 data_time: 0.0076 memory: 6021 grad_norm: 6.5896 loss: 3.5896 student.top1_acc: 0.5000 student.top5_acc: 0.8750 student.loss_cls: 1.1493 distill.loss_dist: 2.4403 2023/05/29 19:30:59 - mmengine - INFO - Epoch(train) [97][360/940] lr: 1.0000e-04 eta: 0:24:18 time: 0.4110 data_time: 0.0088 memory: 6021 grad_norm: 6.6122 loss: 3.6441 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0575 distill.loss_dist: 2.5866 2023/05/29 19:31:07 - mmengine - INFO - Epoch(train) [97][380/940] lr: 1.0000e-04 eta: 0:24:04 time: 0.4021 data_time: 0.0078 memory: 6021 grad_norm: 6.6822 loss: 3.4523 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 0.9739 distill.loss_dist: 2.4784 2023/05/29 19:31:15 - mmengine - INFO - Epoch(train) [97][400/940] lr: 1.0000e-04 eta: 0:23:52 time: 0.4056 data_time: 0.0085 memory: 6021 grad_norm: 6.9727 loss: 3.7741 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1899 distill.loss_dist: 2.5842 2023/05/29 19:31:23 - mmengine - INFO - Epoch(train) [97][420/940] lr: 1.0000e-04 eta: 0:23:41 time: 0.4088 data_time: 0.0077 memory: 6021 grad_norm: 6.5663 loss: 3.3281 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0483 distill.loss_dist: 2.2798 2023/05/29 19:31:31 - mmengine - INFO - Epoch(train) [97][440/940] lr: 1.0000e-04 eta: 0:23:29 time: 0.4008 data_time: 0.0078 memory: 6021 grad_norm: 6.7992 loss: 3.3939 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1602 distill.loss_dist: 2.2338 2023/05/29 19:31:39 - mmengine - INFO - Epoch(train) [97][460/940] lr: 1.0000e-04 eta: 0:23:17 time: 0.4019 data_time: 0.0076 memory: 6021 grad_norm: 6.7564 loss: 3.5182 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1412 distill.loss_dist: 2.3770 2023/05/29 19:31:47 - mmengine - INFO - Epoch(train) [97][480/940] lr: 1.0000e-04 eta: 0:23:05 time: 0.4014 data_time: 0.0075 memory: 6021 grad_norm: 6.6893 loss: 3.3538 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1497 distill.loss_dist: 2.2041 2023/05/29 19:31:55 - mmengine - INFO - Epoch(train) [97][500/940] lr: 1.0000e-04 eta: 0:22:54 time: 0.4018 data_time: 0.0076 memory: 6021 grad_norm: 6.5555 loss: 3.5464 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 0.9375 distill.loss_dist: 2.6089 2023/05/29 19:32:04 - mmengine - INFO - Epoch(train) [97][520/940] lr: 1.0000e-04 eta: 0:22:44 time: 0.4087 data_time: 0.0074 memory: 6021 grad_norm: 6.6423 loss: 3.1308 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.8301 distill.loss_dist: 2.3008 2023/05/29 19:32:12 - mmengine - INFO - Epoch(train) [97][540/940] lr: 1.0000e-04 eta: 0:22:34 time: 0.4094 data_time: 0.0076 memory: 6021 grad_norm: 6.6711 loss: 3.4939 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0330 distill.loss_dist: 2.4609 2023/05/29 19:32:20 - mmengine - INFO - Epoch(train) [97][560/940] lr: 1.0000e-04 eta: 0:22:24 time: 0.4016 data_time: 0.0079 memory: 6021 grad_norm: 6.6909 loss: 3.7175 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.2732 distill.loss_dist: 2.4443 2023/05/29 19:32:28 - mmengine - INFO - Epoch(train) [97][580/940] lr: 1.0000e-04 eta: 0:22:13 time: 0.4005 data_time: 0.0076 memory: 6021 grad_norm: 6.4537 loss: 3.3739 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0645 distill.loss_dist: 2.3095 2023/05/29 19:32:36 - mmengine - INFO - Epoch(train) [97][600/940] lr: 1.0000e-04 eta: 0:22:03 time: 0.4012 data_time: 0.0076 memory: 6021 grad_norm: 6.7339 loss: 3.4291 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.1228 distill.loss_dist: 2.3063 2023/05/29 19:32:44 - mmengine - INFO - Epoch(train) [97][620/940] lr: 1.0000e-04 eta: 0:21:53 time: 0.4005 data_time: 0.0077 memory: 6021 grad_norm: 6.7564 loss: 3.5098 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1025 distill.loss_dist: 2.4073 2023/05/29 19:32:52 - mmengine - INFO - Epoch(train) [97][640/940] lr: 1.0000e-04 eta: 0:21:43 time: 0.4011 data_time: 0.0076 memory: 6021 grad_norm: 6.6798 loss: 3.5989 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1539 distill.loss_dist: 2.4451 2023/05/29 19:33:00 - mmengine - INFO - Epoch(train) [97][660/940] lr: 1.0000e-04 eta: 0:21:33 time: 0.4082 data_time: 0.0076 memory: 6021 grad_norm: 6.6446 loss: 3.5282 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0545 distill.loss_dist: 2.4737 2023/05/29 19:33:08 - mmengine - INFO - Epoch(train) [97][680/940] lr: 1.0000e-04 eta: 0:21:24 time: 0.4003 data_time: 0.0077 memory: 6021 grad_norm: 6.7553 loss: 3.5235 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 0.9646 distill.loss_dist: 2.5590 2023/05/29 19:33:16 - mmengine - INFO - Epoch(train) [97][700/940] lr: 1.0000e-04 eta: 0:21:14 time: 0.4005 data_time: 0.0079 memory: 6021 grad_norm: 6.7076 loss: 3.9051 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.2818 distill.loss_dist: 2.6233 2023/05/29 19:33:24 - mmengine - INFO - Epoch(train) [97][720/940] lr: 1.0000e-04 eta: 0:21:05 time: 0.4140 data_time: 0.0080 memory: 6021 grad_norm: 6.7916 loss: 3.3596 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 0.9876 distill.loss_dist: 2.3720 2023/05/29 19:33:33 - mmengine - INFO - Epoch(train) [97][740/940] lr: 1.0000e-04 eta: 0:20:56 time: 0.4078 data_time: 0.0077 memory: 6021 grad_norm: 6.6358 loss: 3.4840 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.2025 distill.loss_dist: 2.2814 2023/05/29 19:33:41 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:33:41 - mmengine - INFO - Epoch(train) [97][760/940] lr: 1.0000e-04 eta: 0:20:47 time: 0.3987 data_time: 0.0076 memory: 6021 grad_norm: 6.7344 loss: 3.6627 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.2234 distill.loss_dist: 2.4393 2023/05/29 19:33:49 - mmengine - INFO - Epoch(train) [97][780/940] lr: 1.0000e-04 eta: 0:20:38 time: 0.4079 data_time: 0.0074 memory: 6021 grad_norm: 6.6079 loss: 3.2952 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.7731 distill.loss_dist: 2.5221 2023/05/29 19:33:57 - mmengine - INFO - Epoch(train) [97][800/940] lr: 1.0000e-04 eta: 0:20:29 time: 0.4064 data_time: 0.0076 memory: 6021 grad_norm: 6.6835 loss: 3.6692 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1446 distill.loss_dist: 2.5246 2023/05/29 19:34:05 - mmengine - INFO - Epoch(train) [97][820/940] lr: 1.0000e-04 eta: 0:20:19 time: 0.4019 data_time: 0.0071 memory: 6021 grad_norm: 6.8290 loss: 3.7493 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1370 distill.loss_dist: 2.6123 2023/05/29 19:34:13 - mmengine - INFO - Epoch(train) [97][840/940] lr: 1.0000e-04 eta: 0:20:11 time: 0.4088 data_time: 0.0077 memory: 6021 grad_norm: 6.5624 loss: 3.4935 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1180 distill.loss_dist: 2.3754 2023/05/29 19:34:21 - mmengine - INFO - Epoch(train) [97][860/940] lr: 1.0000e-04 eta: 0:20:02 time: 0.4075 data_time: 0.0076 memory: 6021 grad_norm: 6.6359 loss: 3.6196 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.1408 distill.loss_dist: 2.4788 2023/05/29 19:34:29 - mmengine - INFO - Epoch(train) [97][880/940] lr: 1.0000e-04 eta: 0:19:53 time: 0.4015 data_time: 0.0078 memory: 6021 grad_norm: 6.8319 loss: 3.6201 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0598 distill.loss_dist: 2.5603 2023/05/29 19:34:37 - mmengine - INFO - Epoch(train) [97][900/940] lr: 1.0000e-04 eta: 0:19:44 time: 0.4017 data_time: 0.0078 memory: 6021 grad_norm: 6.6837 loss: 3.7207 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.1297 distill.loss_dist: 2.5910 2023/05/29 19:34:45 - mmengine - INFO - Epoch(train) [97][920/940] lr: 1.0000e-04 eta: 0:19:34 time: 0.4005 data_time: 0.0079 memory: 6021 grad_norm: 6.5485 loss: 3.5911 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1783 distill.loss_dist: 2.4127 2023/05/29 19:34:53 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:34:53 - mmengine - INFO - Epoch(train) [97][940/940] lr: 1.0000e-04 eta: 0:19:24 time: 0.3813 data_time: 0.0069 memory: 6021 grad_norm: 7.0299 loss: 3.6984 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.1335 distill.loss_dist: 2.5649 2023/05/29 19:35:01 - mmengine - INFO - Epoch(val) [97][20/78] eta: 0:00:22 time: 0.3823 data_time: 0.3298 memory: 1444 2023/05/29 19:35:05 - mmengine - INFO - Epoch(val) [97][40/78] eta: 0:00:11 time: 0.2025 data_time: 0.1509 memory: 1444 2023/05/29 19:35:11 - mmengine - INFO - Epoch(val) [97][60/78] eta: 0:00:05 time: 0.2924 data_time: 0.2409 memory: 1444 2023/05/29 19:35:22 - mmengine - INFO - Epoch(val) [97][20/78] eta: 0:01:19 time: 0.3425 data_time: 0.1587 memory: 2227 2023/05/29 19:35:27 - mmengine - INFO - Epoch(val) [97][40/78] eta: 0:00:31 time: 0.2619 data_time: 0.0781 memory: 2227 2023/05/29 19:35:33 - mmengine - INFO - Epoch(val) [97][60/78] eta: 0:00:11 time: 0.2719 data_time: 0.0879 memory: 2227 2023/05/29 19:35:38 - mmengine - INFO - Epoch(val) [97][78/78] acc/top1: 0.7303 acc/top5: 0.9076 acc/mean1: 0.7302 teacher.acc/top1: 0.7727 teacher.acc/top5: 0.9298 teacher.acc/mean1: 0.7726 data_time: 0.0839 time: 0.2641 2023/05/29 19:35:48 - mmengine - INFO - Epoch(train) [98][ 20/940] lr: 1.0000e-04 eta: 0:19:22 time: 0.5179 data_time: 0.0843 memory: 6021 grad_norm: 6.6476 loss: 3.4853 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1274 distill.loss_dist: 2.3579 2023/05/29 19:35:56 - mmengine - INFO - Epoch(train) [98][ 40/940] lr: 1.0000e-04 eta: 0:19:13 time: 0.4002 data_time: 0.0075 memory: 6021 grad_norm: 6.5761 loss: 3.5652 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.3698 distill.loss_dist: 2.1954 2023/05/29 19:36:04 - mmengine - INFO - Epoch(train) [98][ 60/940] lr: 1.0000e-04 eta: 0:19:04 time: 0.4011 data_time: 0.0076 memory: 6021 grad_norm: 6.6800 loss: 3.2708 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1237 distill.loss_dist: 2.1471 2023/05/29 19:36:12 - mmengine - INFO - Epoch(train) [98][ 80/940] lr: 1.0000e-04 eta: 0:18:55 time: 0.4029 data_time: 0.0076 memory: 6021 grad_norm: 6.7007 loss: 3.0832 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.7699 distill.loss_dist: 2.3133 2023/05/29 19:36:20 - mmengine - INFO - Epoch(train) [98][100/940] lr: 1.0000e-04 eta: 0:18:46 time: 0.4050 data_time: 0.0075 memory: 6021 grad_norm: 6.9197 loss: 3.7876 student.top1_acc: 0.3750 student.top5_acc: 0.5000 student.loss_cls: 1.1995 distill.loss_dist: 2.5881 2023/05/29 19:36:29 - mmengine - INFO - Epoch(train) [98][120/940] lr: 1.0000e-04 eta: 0:18:38 time: 0.4084 data_time: 0.0083 memory: 6021 grad_norm: 6.7849 loss: 3.5438 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.2907 distill.loss_dist: 2.2532 2023/05/29 19:36:37 - mmengine - INFO - Epoch(train) [98][140/940] lr: 1.0000e-04 eta: 0:18:29 time: 0.4062 data_time: 0.0076 memory: 6021 grad_norm: 6.6913 loss: 3.5529 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.2141 distill.loss_dist: 2.3388 2023/05/29 19:36:45 - mmengine - INFO - Epoch(train) [98][160/940] lr: 1.0000e-04 eta: 0:18:20 time: 0.4072 data_time: 0.0077 memory: 6021 grad_norm: 6.5471 loss: 3.6248 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.0056 distill.loss_dist: 2.6191 2023/05/29 19:36:53 - mmengine - INFO - Epoch(train) [98][180/940] lr: 1.0000e-04 eta: 0:18:12 time: 0.4104 data_time: 0.0078 memory: 6021 grad_norm: 6.5732 loss: 3.6400 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.2101 distill.loss_dist: 2.4300 2023/05/29 19:37:01 - mmengine - INFO - Epoch(train) [98][200/940] lr: 1.0000e-04 eta: 0:18:03 time: 0.4008 data_time: 0.0074 memory: 6021 grad_norm: 6.5725 loss: 3.3579 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.2280 distill.loss_dist: 2.1298 2023/05/29 19:37:09 - mmengine - INFO - Epoch(train) [98][220/940] lr: 1.0000e-04 eta: 0:17:54 time: 0.4050 data_time: 0.0082 memory: 6021 grad_norm: 6.4721 loss: 3.5251 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1121 distill.loss_dist: 2.4130 2023/05/29 19:37:17 - mmengine - INFO - Epoch(train) [98][240/940] lr: 1.0000e-04 eta: 0:17:46 time: 0.4087 data_time: 0.0077 memory: 6021 grad_norm: 6.6136 loss: 3.1402 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.9086 distill.loss_dist: 2.2316 2023/05/29 19:37:25 - mmengine - INFO - Epoch(train) [98][260/940] lr: 1.0000e-04 eta: 0:17:37 time: 0.3989 data_time: 0.0077 memory: 6021 grad_norm: 6.6526 loss: 3.5407 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1105 distill.loss_dist: 2.4302 2023/05/29 19:37:34 - mmengine - INFO - Epoch(train) [98][280/940] lr: 1.0000e-04 eta: 0:17:29 time: 0.4074 data_time: 0.0080 memory: 6021 grad_norm: 6.7712 loss: 3.7253 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1347 distill.loss_dist: 2.5906 2023/05/29 19:37:42 - mmengine - INFO - Epoch(train) [98][300/940] lr: 1.0000e-04 eta: 0:17:20 time: 0.4098 data_time: 0.0075 memory: 6021 grad_norm: 6.6427 loss: 3.4748 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.0991 distill.loss_dist: 2.3757 2023/05/29 19:37:50 - mmengine - INFO - Epoch(train) [98][320/940] lr: 1.0000e-04 eta: 0:17:12 time: 0.4064 data_time: 0.0080 memory: 6021 grad_norm: 6.8392 loss: 3.4791 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1006 distill.loss_dist: 2.3785 2023/05/29 19:37:58 - mmengine - INFO - Epoch(train) [98][340/940] lr: 1.0000e-04 eta: 0:17:03 time: 0.4102 data_time: 0.0084 memory: 6021 grad_norm: 6.7068 loss: 3.6548 student.top1_acc: 0.3750 student.top5_acc: 0.7500 student.loss_cls: 1.1245 distill.loss_dist: 2.5303 2023/05/29 19:38:06 - mmengine - INFO - Epoch(train) [98][360/940] lr: 1.0000e-04 eta: 0:16:55 time: 0.4011 data_time: 0.0083 memory: 6021 grad_norm: 6.7447 loss: 4.0055 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.1639 distill.loss_dist: 2.8416 2023/05/29 19:38:14 - mmengine - INFO - Epoch(train) [98][380/940] lr: 1.0000e-04 eta: 0:16:46 time: 0.4060 data_time: 0.0077 memory: 6021 grad_norm: 6.5377 loss: 3.2911 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.0291 distill.loss_dist: 2.2620 2023/05/29 19:38:22 - mmengine - INFO - Epoch(train) [98][400/940] lr: 1.0000e-04 eta: 0:16:38 time: 0.4084 data_time: 0.0081 memory: 6021 grad_norm: 6.7597 loss: 3.6150 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1062 distill.loss_dist: 2.5088 2023/05/29 19:38:31 - mmengine - INFO - Epoch(train) [98][420/940] lr: 1.0000e-04 eta: 0:16:29 time: 0.4087 data_time: 0.0076 memory: 6021 grad_norm: 6.5082 loss: 3.6350 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1374 distill.loss_dist: 2.4976 2023/05/29 19:38:39 - mmengine - INFO - Epoch(train) [98][440/940] lr: 1.0000e-04 eta: 0:16:21 time: 0.4084 data_time: 0.0075 memory: 6021 grad_norm: 6.7288 loss: 3.4664 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.1111 distill.loss_dist: 2.3553 2023/05/29 19:38:47 - mmengine - INFO - Epoch(train) [98][460/940] lr: 1.0000e-04 eta: 0:16:12 time: 0.4014 data_time: 0.0077 memory: 6021 grad_norm: 6.5957 loss: 3.2907 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0395 distill.loss_dist: 2.2512 2023/05/29 19:38:55 - mmengine - INFO - Epoch(train) [98][480/940] lr: 1.0000e-04 eta: 0:16:04 time: 0.4012 data_time: 0.0072 memory: 6021 grad_norm: 6.7199 loss: 3.7779 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1209 distill.loss_dist: 2.6569 2023/05/29 19:39:03 - mmengine - INFO - Epoch(train) [98][500/940] lr: 1.0000e-04 eta: 0:15:55 time: 0.4093 data_time: 0.0074 memory: 6021 grad_norm: 6.8476 loss: 3.4980 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.2118 distill.loss_dist: 2.2862 2023/05/29 19:39:11 - mmengine - INFO - Epoch(train) [98][520/940] lr: 1.0000e-04 eta: 0:15:47 time: 0.4020 data_time: 0.0076 memory: 6021 grad_norm: 6.6238 loss: 3.6866 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.3669 distill.loss_dist: 2.3197 2023/05/29 19:39:19 - mmengine - INFO - Epoch(train) [98][540/940] lr: 1.0000e-04 eta: 0:15:38 time: 0.4079 data_time: 0.0077 memory: 6021 grad_norm: 6.3276 loss: 3.4668 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.0989 distill.loss_dist: 2.3679 2023/05/29 19:39:27 - mmengine - INFO - Epoch(train) [98][560/940] lr: 1.0000e-04 eta: 0:15:30 time: 0.4103 data_time: 0.0080 memory: 6021 grad_norm: 6.6249 loss: 3.2230 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0226 distill.loss_dist: 2.2004 2023/05/29 19:39:36 - mmengine - INFO - Epoch(train) [98][580/940] lr: 1.0000e-04 eta: 0:15:22 time: 0.4031 data_time: 0.0075 memory: 6021 grad_norm: 6.4539 loss: 3.0269 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.9367 distill.loss_dist: 2.0902 2023/05/29 19:39:44 - mmengine - INFO - Epoch(train) [98][600/940] lr: 1.0000e-04 eta: 0:15:13 time: 0.4018 data_time: 0.0074 memory: 6021 grad_norm: 6.6116 loss: 3.6049 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0690 distill.loss_dist: 2.5359 2023/05/29 19:39:52 - mmengine - INFO - Epoch(train) [98][620/940] lr: 1.0000e-04 eta: 0:15:05 time: 0.4037 data_time: 0.0073 memory: 6021 grad_norm: 6.6852 loss: 3.4575 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9746 distill.loss_dist: 2.4829 2023/05/29 19:40:00 - mmengine - INFO - Epoch(train) [98][640/940] lr: 1.0000e-04 eta: 0:14:56 time: 0.4071 data_time: 0.0079 memory: 6021 grad_norm: 6.6771 loss: 3.3604 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0005 distill.loss_dist: 2.3599 2023/05/29 19:40:08 - mmengine - INFO - Epoch(train) [98][660/940] lr: 1.0000e-04 eta: 0:14:48 time: 0.4013 data_time: 0.0070 memory: 6021 grad_norm: 6.5437 loss: 3.4247 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1717 distill.loss_dist: 2.2530 2023/05/29 19:40:16 - mmengine - INFO - Epoch(train) [98][680/940] lr: 1.0000e-04 eta: 0:14:39 time: 0.4010 data_time: 0.0075 memory: 6021 grad_norm: 6.7671 loss: 3.5452 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.3275 distill.loss_dist: 2.2177 2023/05/29 19:40:24 - mmengine - INFO - Epoch(train) [98][700/940] lr: 1.0000e-04 eta: 0:14:31 time: 0.4102 data_time: 0.0075 memory: 6021 grad_norm: 6.6439 loss: 3.3310 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0230 distill.loss_dist: 2.3080 2023/05/29 19:40:32 - mmengine - INFO - Epoch(train) [98][720/940] lr: 1.0000e-04 eta: 0:14:23 time: 0.4011 data_time: 0.0073 memory: 6021 grad_norm: 6.6304 loss: 3.4180 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0487 distill.loss_dist: 2.3692 2023/05/29 19:40:40 - mmengine - INFO - Epoch(train) [98][740/940] lr: 1.0000e-04 eta: 0:14:14 time: 0.4014 data_time: 0.0070 memory: 6021 grad_norm: 6.6456 loss: 3.4011 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0667 distill.loss_dist: 2.3344 2023/05/29 19:40:48 - mmengine - INFO - Epoch(train) [98][760/940] lr: 1.0000e-04 eta: 0:14:06 time: 0.4087 data_time: 0.0072 memory: 6021 grad_norm: 6.6794 loss: 3.4682 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0947 distill.loss_dist: 2.3735 2023/05/29 19:40:56 - mmengine - INFO - Epoch(train) [98][780/940] lr: 1.0000e-04 eta: 0:13:58 time: 0.4076 data_time: 0.0076 memory: 6021 grad_norm: 6.8405 loss: 3.3556 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 0.8674 distill.loss_dist: 2.4882 2023/05/29 19:41:05 - mmengine - INFO - Epoch(train) [98][800/940] lr: 1.0000e-04 eta: 0:13:49 time: 0.4063 data_time: 0.0076 memory: 6021 grad_norm: 6.6922 loss: 3.3954 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.2031 distill.loss_dist: 2.1924 2023/05/29 19:41:13 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:41:13 - mmengine - INFO - Epoch(train) [98][820/940] lr: 1.0000e-04 eta: 0:13:41 time: 0.4022 data_time: 0.0074 memory: 6021 grad_norm: 6.7688 loss: 3.4639 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.2353 distill.loss_dist: 2.2286 2023/05/29 19:41:21 - mmengine - INFO - Epoch(train) [98][840/940] lr: 1.0000e-04 eta: 0:13:33 time: 0.4098 data_time: 0.0077 memory: 6021 grad_norm: 6.7706 loss: 3.3413 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 0.9879 distill.loss_dist: 2.3535 2023/05/29 19:41:29 - mmengine - INFO - Epoch(train) [98][860/940] lr: 1.0000e-04 eta: 0:13:24 time: 0.4013 data_time: 0.0072 memory: 6021 grad_norm: 6.8269 loss: 3.4975 student.top1_acc: 0.3750 student.top5_acc: 0.8750 student.loss_cls: 1.2236 distill.loss_dist: 2.2739 2023/05/29 19:41:37 - mmengine - INFO - Epoch(train) [98][880/940] lr: 1.0000e-04 eta: 0:13:16 time: 0.4016 data_time: 0.0076 memory: 6021 grad_norm: 6.6681 loss: 3.5887 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1269 distill.loss_dist: 2.4618 2023/05/29 19:41:45 - mmengine - INFO - Epoch(train) [98][900/940] lr: 1.0000e-04 eta: 0:13:07 time: 0.4008 data_time: 0.0075 memory: 6021 grad_norm: 6.7021 loss: 3.6018 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9680 distill.loss_dist: 2.6339 2023/05/29 19:41:53 - mmengine - INFO - Epoch(train) [98][920/940] lr: 1.0000e-04 eta: 0:12:59 time: 0.4005 data_time: 0.0077 memory: 6021 grad_norm: 6.7226 loss: 3.3504 student.top1_acc: 0.5000 student.top5_acc: 0.8750 student.loss_cls: 1.1069 distill.loss_dist: 2.2435 2023/05/29 19:42:01 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:42:01 - mmengine - INFO - Epoch(train) [98][940/940] lr: 1.0000e-04 eta: 0:12:50 time: 0.3915 data_time: 0.0069 memory: 6021 grad_norm: 7.2279 loss: 3.7729 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0544 distill.loss_dist: 2.7185 2023/05/29 19:42:07 - mmengine - INFO - Epoch(val) [98][20/78] eta: 0:00:18 time: 0.3256 data_time: 0.2730 memory: 1444 2023/05/29 19:42:12 - mmengine - INFO - Epoch(val) [98][40/78] eta: 0:00:10 time: 0.2108 data_time: 0.1590 memory: 1444 2023/05/29 19:42:17 - mmengine - INFO - Epoch(val) [98][60/78] eta: 0:00:04 time: 0.2713 data_time: 0.2198 memory: 1444 2023/05/29 19:42:29 - mmengine - INFO - Epoch(val) [98][20/78] eta: 0:01:15 time: 0.3468 data_time: 0.1631 memory: 2227 2023/05/29 19:42:34 - mmengine - INFO - Epoch(val) [98][40/78] eta: 0:00:29 time: 0.2408 data_time: 0.0564 memory: 2227 2023/05/29 19:42:39 - mmengine - INFO - Epoch(val) [98][60/78] eta: 0:00:10 time: 0.2511 data_time: 0.0671 memory: 2227 2023/05/29 19:42:44 - mmengine - INFO - Epoch(val) [98][78/78] acc/top1: 0.7301 acc/top5: 0.9080 acc/mean1: 0.7300 teacher.acc/top1: 0.7727 teacher.acc/top5: 0.9298 teacher.acc/mean1: 0.7726 data_time: 0.0780 time: 0.2586 2023/05/29 19:42:54 - mmengine - INFO - Epoch(train) [99][ 20/940] lr: 1.0000e-04 eta: 0:12:44 time: 0.5162 data_time: 0.0625 memory: 6021 grad_norm: 6.5578 loss: 3.3237 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.2855 distill.loss_dist: 2.0382 2023/05/29 19:43:03 - mmengine - INFO - Epoch(train) [99][ 40/940] lr: 1.0000e-04 eta: 0:12:36 time: 0.4095 data_time: 0.0076 memory: 6021 grad_norm: 6.8310 loss: 3.4588 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.9362 distill.loss_dist: 2.5226 2023/05/29 19:43:11 - mmengine - INFO - Epoch(train) [99][ 60/940] lr: 1.0000e-04 eta: 0:12:28 time: 0.4021 data_time: 0.0077 memory: 6021 grad_norm: 6.7432 loss: 3.6601 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.2053 distill.loss_dist: 2.4548 2023/05/29 19:43:19 - mmengine - INFO - Epoch(train) [99][ 80/940] lr: 1.0000e-04 eta: 0:12:19 time: 0.4091 data_time: 0.0073 memory: 6021 grad_norm: 6.7405 loss: 3.7851 student.top1_acc: 0.5000 student.top5_acc: 0.8750 student.loss_cls: 1.1699 distill.loss_dist: 2.6152 2023/05/29 19:43:27 - mmengine - INFO - Epoch(train) [99][100/940] lr: 1.0000e-04 eta: 0:12:11 time: 0.4021 data_time: 0.0078 memory: 6021 grad_norm: 6.5184 loss: 3.4570 student.top1_acc: 0.6250 student.top5_acc: 0.6250 student.loss_cls: 1.1725 distill.loss_dist: 2.2845 2023/05/29 19:43:35 - mmengine - INFO - Epoch(train) [99][120/940] lr: 1.0000e-04 eta: 0:12:03 time: 0.4104 data_time: 0.0076 memory: 6021 grad_norm: 6.6915 loss: 3.3968 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.2046 distill.loss_dist: 2.1922 2023/05/29 19:43:43 - mmengine - INFO - Epoch(train) [99][140/940] lr: 1.0000e-04 eta: 0:11:54 time: 0.4072 data_time: 0.0076 memory: 6021 grad_norm: 6.7742 loss: 3.5145 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1881 distill.loss_dist: 2.3263 2023/05/29 19:43:51 - mmengine - INFO - Epoch(train) [99][160/940] lr: 1.0000e-04 eta: 0:11:46 time: 0.4077 data_time: 0.0073 memory: 6021 grad_norm: 6.7853 loss: 3.6396 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.9922 distill.loss_dist: 2.6474 2023/05/29 19:43:59 - mmengine - INFO - Epoch(train) [99][180/940] lr: 1.0000e-04 eta: 0:11:38 time: 0.4002 data_time: 0.0073 memory: 6021 grad_norm: 6.5806 loss: 3.9444 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 1.1127 distill.loss_dist: 2.8317 2023/05/29 19:44:07 - mmengine - INFO - Epoch(train) [99][200/940] lr: 1.0000e-04 eta: 0:11:29 time: 0.4001 data_time: 0.0075 memory: 6021 grad_norm: 6.7627 loss: 3.5107 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.0501 distill.loss_dist: 2.4606 2023/05/29 19:44:15 - mmengine - INFO - Epoch(train) [99][220/940] lr: 1.0000e-04 eta: 0:11:21 time: 0.4014 data_time: 0.0074 memory: 6021 grad_norm: 6.5804 loss: 3.7871 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.2497 distill.loss_dist: 2.5374 2023/05/29 19:44:23 - mmengine - INFO - Epoch(train) [99][240/940] lr: 1.0000e-04 eta: 0:11:13 time: 0.4018 data_time: 0.0072 memory: 6021 grad_norm: 6.5318 loss: 3.4126 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.8909 distill.loss_dist: 2.5218 2023/05/29 19:44:31 - mmengine - INFO - Epoch(train) [99][260/940] lr: 1.0000e-04 eta: 0:11:04 time: 0.4044 data_time: 0.0073 memory: 6021 grad_norm: 6.4535 loss: 3.5633 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1769 distill.loss_dist: 2.3864 2023/05/29 19:44:40 - mmengine - INFO - Epoch(train) [99][280/940] lr: 1.0000e-04 eta: 0:10:56 time: 0.4021 data_time: 0.0075 memory: 6021 grad_norm: 6.8424 loss: 3.6335 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.1811 distill.loss_dist: 2.4524 2023/05/29 19:44:48 - mmengine - INFO - Epoch(train) [99][300/940] lr: 1.0000e-04 eta: 0:10:48 time: 0.4003 data_time: 0.0073 memory: 6021 grad_norm: 6.6503 loss: 3.7403 student.top1_acc: 0.6250 student.top5_acc: 0.6250 student.loss_cls: 1.1847 distill.loss_dist: 2.5556 2023/05/29 19:44:56 - mmengine - INFO - Epoch(train) [99][320/940] lr: 1.0000e-04 eta: 0:10:40 time: 0.4090 data_time: 0.0072 memory: 6021 grad_norm: 6.5631 loss: 3.6877 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 0.9942 distill.loss_dist: 2.6935 2023/05/29 19:45:04 - mmengine - INFO - Epoch(train) [99][340/940] lr: 1.0000e-04 eta: 0:10:31 time: 0.3989 data_time: 0.0074 memory: 6021 grad_norm: 6.7353 loss: 3.5808 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.0909 distill.loss_dist: 2.4900 2023/05/29 19:45:12 - mmengine - INFO - Epoch(train) [99][360/940] lr: 1.0000e-04 eta: 0:10:23 time: 0.4094 data_time: 0.0073 memory: 6021 grad_norm: 6.6775 loss: 3.4691 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1082 distill.loss_dist: 2.3609 2023/05/29 19:45:20 - mmengine - INFO - Epoch(train) [99][380/940] lr: 1.0000e-04 eta: 0:10:15 time: 0.4026 data_time: 0.0076 memory: 6021 grad_norm: 6.5416 loss: 3.1985 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0278 distill.loss_dist: 2.1707 2023/05/29 19:45:28 - mmengine - INFO - Epoch(train) [99][400/940] lr: 1.0000e-04 eta: 0:10:06 time: 0.4072 data_time: 0.0075 memory: 6021 grad_norm: 6.6624 loss: 3.3202 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1027 distill.loss_dist: 2.2175 2023/05/29 19:45:36 - mmengine - INFO - Epoch(train) [99][420/940] lr: 1.0000e-04 eta: 0:09:58 time: 0.4021 data_time: 0.0073 memory: 6021 grad_norm: 6.5379 loss: 3.4516 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1848 distill.loss_dist: 2.2668 2023/05/29 19:45:44 - mmengine - INFO - Epoch(train) [99][440/940] lr: 1.0000e-04 eta: 0:09:50 time: 0.4007 data_time: 0.0073 memory: 6021 grad_norm: 6.7396 loss: 3.5248 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0534 distill.loss_dist: 2.4714 2023/05/29 19:45:52 - mmengine - INFO - Epoch(train) [99][460/940] lr: 1.0000e-04 eta: 0:09:41 time: 0.4031 data_time: 0.0074 memory: 6021 grad_norm: 6.5564 loss: 3.4996 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0923 distill.loss_dist: 2.4073 2023/05/29 19:46:00 - mmengine - INFO - Epoch(train) [99][480/940] lr: 1.0000e-04 eta: 0:09:33 time: 0.4129 data_time: 0.0077 memory: 6021 grad_norm: 6.5327 loss: 3.5497 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.1243 distill.loss_dist: 2.4254 2023/05/29 19:46:09 - mmengine - INFO - Epoch(train) [99][500/940] lr: 1.0000e-04 eta: 0:09:25 time: 0.4011 data_time: 0.0083 memory: 6021 grad_norm: 6.6420 loss: 3.5821 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9895 distill.loss_dist: 2.5926 2023/05/29 19:46:17 - mmengine - INFO - Epoch(train) [99][520/940] lr: 1.0000e-04 eta: 0:09:17 time: 0.3999 data_time: 0.0074 memory: 6021 grad_norm: 6.7157 loss: 3.7549 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 1.1374 distill.loss_dist: 2.6175 2023/05/29 19:46:25 - mmengine - INFO - Epoch(train) [99][540/940] lr: 1.0000e-04 eta: 0:09:08 time: 0.4024 data_time: 0.0075 memory: 6021 grad_norm: 6.8658 loss: 3.4891 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1123 distill.loss_dist: 2.3768 2023/05/29 19:46:33 - mmengine - INFO - Epoch(train) [99][560/940] lr: 1.0000e-04 eta: 0:09:00 time: 0.4045 data_time: 0.0080 memory: 6021 grad_norm: 6.7044 loss: 3.4934 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1509 distill.loss_dist: 2.3424 2023/05/29 19:46:41 - mmengine - INFO - Epoch(train) [99][580/940] lr: 1.0000e-04 eta: 0:08:52 time: 0.4092 data_time: 0.0077 memory: 6021 grad_norm: 6.7223 loss: 3.9800 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.2285 distill.loss_dist: 2.7515 2023/05/29 19:46:49 - mmengine - INFO - Epoch(train) [99][600/940] lr: 1.0000e-04 eta: 0:08:44 time: 0.4097 data_time: 0.0073 memory: 6021 grad_norm: 6.6838 loss: 3.2693 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9645 distill.loss_dist: 2.3047 2023/05/29 19:46:57 - mmengine - INFO - Epoch(train) [99][620/940] lr: 1.0000e-04 eta: 0:08:36 time: 0.3999 data_time: 0.0075 memory: 6021 grad_norm: 6.6219 loss: 3.4313 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.1388 distill.loss_dist: 2.2925 2023/05/29 19:47:05 - mmengine - INFO - Epoch(train) [99][640/940] lr: 1.0000e-04 eta: 0:08:27 time: 0.4089 data_time: 0.0076 memory: 6021 grad_norm: 6.5977 loss: 3.5154 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.1303 distill.loss_dist: 2.3851 2023/05/29 19:47:13 - mmengine - INFO - Epoch(train) [99][660/940] lr: 1.0000e-04 eta: 0:08:19 time: 0.4096 data_time: 0.0079 memory: 6021 grad_norm: 6.7454 loss: 3.6285 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0750 distill.loss_dist: 2.5535 2023/05/29 19:47:22 - mmengine - INFO - Epoch(train) [99][680/940] lr: 1.0000e-04 eta: 0:08:11 time: 0.4040 data_time: 0.0075 memory: 6021 grad_norm: 6.7226 loss: 3.6915 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.0397 distill.loss_dist: 2.6519 2023/05/29 19:47:30 - mmengine - INFO - Epoch(train) [99][700/940] lr: 1.0000e-04 eta: 0:08:03 time: 0.4013 data_time: 0.0075 memory: 6021 grad_norm: 6.6827 loss: 3.4237 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0933 distill.loss_dist: 2.3305 2023/05/29 19:47:38 - mmengine - INFO - Epoch(train) [99][720/940] lr: 1.0000e-04 eta: 0:07:54 time: 0.4012 data_time: 0.0078 memory: 6021 grad_norm: 6.5270 loss: 3.1495 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 0.9665 distill.loss_dist: 2.1830 2023/05/29 19:47:46 - mmengine - INFO - Epoch(train) [99][740/940] lr: 1.0000e-04 eta: 0:07:46 time: 0.4018 data_time: 0.0079 memory: 6021 grad_norm: 6.6732 loss: 3.5604 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1737 distill.loss_dist: 2.3867 2023/05/29 19:47:54 - mmengine - INFO - Epoch(train) [99][760/940] lr: 1.0000e-04 eta: 0:07:38 time: 0.4010 data_time: 0.0078 memory: 6021 grad_norm: 6.6320 loss: 3.9308 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.3403 distill.loss_dist: 2.5905 2023/05/29 19:48:02 - mmengine - INFO - Epoch(train) [99][780/940] lr: 1.0000e-04 eta: 0:07:30 time: 0.4065 data_time: 0.0075 memory: 6021 grad_norm: 6.6592 loss: 3.3717 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1647 distill.loss_dist: 2.2070 2023/05/29 19:48:10 - mmengine - INFO - Epoch(train) [99][800/940] lr: 1.0000e-04 eta: 0:07:21 time: 0.4010 data_time: 0.0076 memory: 6021 grad_norm: 6.6345 loss: 3.1227 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1099 distill.loss_dist: 2.0127 2023/05/29 19:48:18 - mmengine - INFO - Epoch(train) [99][820/940] lr: 1.0000e-04 eta: 0:07:13 time: 0.4031 data_time: 0.0075 memory: 6021 grad_norm: 6.6249 loss: 3.1052 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9652 distill.loss_dist: 2.1400 2023/05/29 19:48:26 - mmengine - INFO - Epoch(train) [99][840/940] lr: 1.0000e-04 eta: 0:07:05 time: 0.4102 data_time: 0.0078 memory: 6021 grad_norm: 6.8591 loss: 3.3259 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 0.9574 distill.loss_dist: 2.3686 2023/05/29 19:48:34 - mmengine - INFO - Epoch(train) [99][860/940] lr: 1.0000e-04 eta: 0:06:57 time: 0.4103 data_time: 0.0076 memory: 6021 grad_norm: 6.6490 loss: 3.3910 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9121 distill.loss_dist: 2.4789 2023/05/29 19:48:42 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:48:42 - mmengine - INFO - Epoch(train) [99][880/940] lr: 1.0000e-04 eta: 0:06:49 time: 0.3993 data_time: 0.0078 memory: 6021 grad_norm: 6.6950 loss: 3.3466 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9527 distill.loss_dist: 2.3939 2023/05/29 19:48:50 - mmengine - INFO - Epoch(train) [99][900/940] lr: 1.0000e-04 eta: 0:06:40 time: 0.4005 data_time: 0.0079 memory: 6021 grad_norm: 6.6353 loss: 3.4208 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.9294 distill.loss_dist: 2.4914 2023/05/29 19:48:58 - mmengine - INFO - Epoch(train) [99][920/940] lr: 1.0000e-04 eta: 0:06:32 time: 0.4014 data_time: 0.0080 memory: 6021 grad_norm: 6.8030 loss: 3.3738 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.9829 distill.loss_dist: 2.3909 2023/05/29 19:49:06 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:49:06 - mmengine - INFO - Epoch(train) [99][940/940] lr: 1.0000e-04 eta: 0:06:24 time: 0.3897 data_time: 0.0070 memory: 6021 grad_norm: 7.0168 loss: 3.8146 student.top1_acc: 0.5000 student.top5_acc: 0.5000 student.loss_cls: 1.3354 distill.loss_dist: 2.4792 2023/05/29 19:49:06 - mmengine - INFO - Saving checkpoint at 99 epochs 2023/05/29 19:49:15 - mmengine - INFO - Epoch(val) [99][20/78] eta: 0:00:18 time: 0.3242 data_time: 0.2718 memory: 1444 2023/05/29 19:49:20 - mmengine - INFO - Epoch(val) [99][40/78] eta: 0:00:10 time: 0.2265 data_time: 0.1745 memory: 1444 2023/05/29 19:49:25 - mmengine - INFO - Epoch(val) [99][60/78] eta: 0:00:04 time: 0.2702 data_time: 0.2195 memory: 1444 2023/05/29 19:49:36 - mmengine - INFO - Epoch(val) [99][20/78] eta: 0:01:14 time: 0.3394 data_time: 0.1557 memory: 2227 2023/05/29 19:49:41 - mmengine - INFO - Epoch(val) [99][40/78] eta: 0:00:28 time: 0.2379 data_time: 0.0538 memory: 2227 2023/05/29 19:49:46 - mmengine - INFO - Epoch(val) [99][60/78] eta: 0:00:10 time: 0.2750 data_time: 0.0910 memory: 2227 2023/05/29 19:49:52 - mmengine - INFO - Epoch(val) [99][78/78] acc/top1: 0.7301 acc/top5: 0.9084 acc/mean1: 0.7301 teacher.acc/top1: 0.7727 teacher.acc/top5: 0.9298 teacher.acc/mean1: 0.7726 data_time: 0.0803 time: 0.2607 2023/05/29 19:50:02 - mmengine - INFO - Epoch(train) [100][ 20/940] lr: 1.0000e-04 eta: 0:06:16 time: 0.5203 data_time: 0.0639 memory: 6021 grad_norm: 6.5640 loss: 3.2860 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.0043 distill.loss_dist: 2.2817 2023/05/29 19:50:10 - mmengine - INFO - Epoch(train) [100][ 40/940] lr: 1.0000e-04 eta: 0:06:08 time: 0.4079 data_time: 0.0074 memory: 6021 grad_norm: 6.6722 loss: 3.6259 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0597 distill.loss_dist: 2.5661 2023/05/29 19:50:18 - mmengine - INFO - Epoch(train) [100][ 60/940] lr: 1.0000e-04 eta: 0:06:00 time: 0.4015 data_time: 0.0078 memory: 6021 grad_norm: 6.7225 loss: 3.6314 student.top1_acc: 0.6250 student.top5_acc: 1.0000 student.loss_cls: 1.0868 distill.loss_dist: 2.5446 2023/05/29 19:50:26 - mmengine - INFO - Epoch(train) [100][ 80/940] lr: 1.0000e-04 eta: 0:05:52 time: 0.4076 data_time: 0.0075 memory: 6021 grad_norm: 6.6391 loss: 3.3159 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9660 distill.loss_dist: 2.3500 2023/05/29 19:50:34 - mmengine - INFO - Epoch(train) [100][100/940] lr: 1.0000e-04 eta: 0:05:43 time: 0.4011 data_time: 0.0076 memory: 6021 grad_norm: 6.5560 loss: 3.6439 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.1835 distill.loss_dist: 2.4604 2023/05/29 19:50:43 - mmengine - INFO - Epoch(train) [100][120/940] lr: 1.0000e-04 eta: 0:05:35 time: 0.4017 data_time: 0.0077 memory: 6021 grad_norm: 6.6064 loss: 3.6195 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0812 distill.loss_dist: 2.5384 2023/05/29 19:50:51 - mmengine - INFO - Epoch(train) [100][140/940] lr: 1.0000e-04 eta: 0:05:27 time: 0.4018 data_time: 0.0074 memory: 6021 grad_norm: 6.6118 loss: 3.2936 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9966 distill.loss_dist: 2.2970 2023/05/29 19:50:59 - mmengine - INFO - Epoch(train) [100][160/940] lr: 1.0000e-04 eta: 0:05:19 time: 0.4093 data_time: 0.0078 memory: 6021 grad_norm: 6.6566 loss: 3.3778 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.0720 distill.loss_dist: 2.3057 2023/05/29 19:51:07 - mmengine - INFO - Epoch(train) [100][180/940] lr: 1.0000e-04 eta: 0:05:11 time: 0.4016 data_time: 0.0076 memory: 6021 grad_norm: 6.6429 loss: 3.7370 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.3158 distill.loss_dist: 2.4212 2023/05/29 19:51:15 - mmengine - INFO - Epoch(train) [100][200/940] lr: 1.0000e-04 eta: 0:05:02 time: 0.4086 data_time: 0.0082 memory: 6021 grad_norm: 6.6441 loss: 3.6872 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.3204 distill.loss_dist: 2.3668 2023/05/29 19:51:23 - mmengine - INFO - Epoch(train) [100][220/940] lr: 1.0000e-04 eta: 0:04:54 time: 0.4059 data_time: 0.0080 memory: 6021 grad_norm: 6.8146 loss: 3.3988 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.0299 distill.loss_dist: 2.3690 2023/05/29 19:51:31 - mmengine - INFO - Epoch(train) [100][240/940] lr: 1.0000e-04 eta: 0:04:46 time: 0.4087 data_time: 0.0074 memory: 6021 grad_norm: 6.7500 loss: 3.6211 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.2076 distill.loss_dist: 2.4135 2023/05/29 19:51:39 - mmengine - INFO - Epoch(train) [100][260/940] lr: 1.0000e-04 eta: 0:04:38 time: 0.4098 data_time: 0.0080 memory: 6021 grad_norm: 6.6846 loss: 3.6462 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.1434 distill.loss_dist: 2.5028 2023/05/29 19:51:48 - mmengine - INFO - Epoch(train) [100][280/940] lr: 1.0000e-04 eta: 0:04:30 time: 0.4085 data_time: 0.0076 memory: 6021 grad_norm: 6.7084 loss: 3.2385 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1064 distill.loss_dist: 2.1320 2023/05/29 19:51:56 - mmengine - INFO - Epoch(train) [100][300/940] lr: 1.0000e-04 eta: 0:04:21 time: 0.4038 data_time: 0.0069 memory: 6021 grad_norm: 6.8262 loss: 3.5116 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.2196 distill.loss_dist: 2.2920 2023/05/29 19:52:04 - mmengine - INFO - Epoch(train) [100][320/940] lr: 1.0000e-04 eta: 0:04:13 time: 0.4141 data_time: 0.0075 memory: 6021 grad_norm: 6.7540 loss: 3.3487 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0830 distill.loss_dist: 2.2657 2023/05/29 19:52:12 - mmengine - INFO - Epoch(train) [100][340/940] lr: 1.0000e-04 eta: 0:04:05 time: 0.3999 data_time: 0.0079 memory: 6021 grad_norm: 6.7524 loss: 2.9074 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.7916 distill.loss_dist: 2.1158 2023/05/29 19:52:20 - mmengine - INFO - Epoch(train) [100][360/940] lr: 1.0000e-04 eta: 0:03:57 time: 0.4013 data_time: 0.0078 memory: 6021 grad_norm: 6.8844 loss: 3.2534 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.8759 distill.loss_dist: 2.3775 2023/05/29 19:52:28 - mmengine - INFO - Epoch(train) [100][380/940] lr: 1.0000e-04 eta: 0:03:49 time: 0.4003 data_time: 0.0076 memory: 6021 grad_norm: 6.7551 loss: 3.3924 student.top1_acc: 0.7500 student.top5_acc: 0.7500 student.loss_cls: 1.2244 distill.loss_dist: 2.1680 2023/05/29 19:52:36 - mmengine - INFO - Epoch(train) [100][400/940] lr: 1.0000e-04 eta: 0:03:40 time: 0.4009 data_time: 0.0079 memory: 6021 grad_norm: 6.7060 loss: 3.5687 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0190 distill.loss_dist: 2.5497 2023/05/29 19:52:44 - mmengine - INFO - Epoch(train) [100][420/940] lr: 1.0000e-04 eta: 0:03:32 time: 0.4025 data_time: 0.0080 memory: 6021 grad_norm: 6.5740 loss: 3.5382 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 1.0017 distill.loss_dist: 2.5364 2023/05/29 19:52:52 - mmengine - INFO - Epoch(train) [100][440/940] lr: 1.0000e-04 eta: 0:03:24 time: 0.4072 data_time: 0.0076 memory: 6021 grad_norm: 6.5829 loss: 3.4158 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.1492 distill.loss_dist: 2.2666 2023/05/29 19:53:00 - mmengine - INFO - Epoch(train) [100][460/940] lr: 1.0000e-04 eta: 0:03:16 time: 0.4003 data_time: 0.0079 memory: 6021 grad_norm: 6.8636 loss: 3.1364 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.8564 distill.loss_dist: 2.2800 2023/05/29 19:53:08 - mmengine - INFO - Epoch(train) [100][480/940] lr: 1.0000e-04 eta: 0:03:08 time: 0.4084 data_time: 0.0073 memory: 6021 grad_norm: 6.8285 loss: 3.3883 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.0149 distill.loss_dist: 2.3734 2023/05/29 19:53:17 - mmengine - INFO - Epoch(train) [100][500/940] lr: 1.0000e-04 eta: 0:02:59 time: 0.4068 data_time: 0.0074 memory: 6021 grad_norm: 6.7985 loss: 3.6195 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.1324 distill.loss_dist: 2.4871 2023/05/29 19:53:25 - mmengine - INFO - Epoch(train) [100][520/940] lr: 1.0000e-04 eta: 0:02:51 time: 0.4002 data_time: 0.0074 memory: 6021 grad_norm: 6.9088 loss: 3.7382 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1994 distill.loss_dist: 2.5388 2023/05/29 19:53:33 - mmengine - INFO - Epoch(train) [100][540/940] lr: 1.0000e-04 eta: 0:02:43 time: 0.4015 data_time: 0.0077 memory: 6021 grad_norm: 6.8663 loss: 3.4412 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.2751 distill.loss_dist: 2.1661 2023/05/29 19:53:41 - mmengine - INFO - Epoch(train) [100][560/940] lr: 1.0000e-04 eta: 0:02:35 time: 0.4124 data_time: 0.0074 memory: 6021 grad_norm: 6.8081 loss: 3.7097 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.0379 distill.loss_dist: 2.6718 2023/05/29 19:53:49 - mmengine - INFO - Epoch(train) [100][580/940] lr: 1.0000e-04 eta: 0:02:27 time: 0.4102 data_time: 0.0078 memory: 6021 grad_norm: 6.6122 loss: 3.3554 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.9057 distill.loss_dist: 2.4497 2023/05/29 19:53:57 - mmengine - INFO - Epoch(train) [100][600/940] lr: 1.0000e-04 eta: 0:02:18 time: 0.4004 data_time: 0.0075 memory: 6021 grad_norm: 6.5408 loss: 3.8258 student.top1_acc: 0.6250 student.top5_acc: 0.7500 student.loss_cls: 1.1851 distill.loss_dist: 2.6407 2023/05/29 19:54:05 - mmengine - INFO - Epoch(train) [100][620/940] lr: 1.0000e-04 eta: 0:02:10 time: 0.4024 data_time: 0.0073 memory: 6021 grad_norm: 6.6627 loss: 3.6948 student.top1_acc: 0.5000 student.top5_acc: 0.7500 student.loss_cls: 1.0571 distill.loss_dist: 2.6377 2023/05/29 19:54:13 - mmengine - INFO - Epoch(train) [100][640/940] lr: 1.0000e-04 eta: 0:02:02 time: 0.4020 data_time: 0.0076 memory: 6021 grad_norm: 6.5909 loss: 3.5604 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 1.1832 distill.loss_dist: 2.3772 2023/05/29 19:54:21 - mmengine - INFO - Epoch(train) [100][660/940] lr: 1.0000e-04 eta: 0:01:54 time: 0.4020 data_time: 0.0079 memory: 6021 grad_norm: 6.7040 loss: 3.0740 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 0.9197 distill.loss_dist: 2.1543 2023/05/29 19:54:29 - mmengine - INFO - Epoch(train) [100][680/940] lr: 1.0000e-04 eta: 0:01:46 time: 0.4012 data_time: 0.0075 memory: 6021 grad_norm: 6.7818 loss: 3.2584 student.top1_acc: 1.0000 student.top5_acc: 1.0000 student.loss_cls: 0.9521 distill.loss_dist: 2.3063 2023/05/29 19:54:37 - mmengine - INFO - Epoch(train) [100][700/940] lr: 1.0000e-04 eta: 0:01:38 time: 0.4005 data_time: 0.0077 memory: 6021 grad_norm: 6.6493 loss: 3.3947 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 1.0773 distill.loss_dist: 2.3174 2023/05/29 19:54:45 - mmengine - INFO - Epoch(train) [100][720/940] lr: 1.0000e-04 eta: 0:01:29 time: 0.4012 data_time: 0.0075 memory: 6021 grad_norm: 6.7105 loss: 3.3163 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1346 distill.loss_dist: 2.1817 2023/05/29 19:54:53 - mmengine - INFO - Epoch(train) [100][740/940] lr: 1.0000e-04 eta: 0:01:21 time: 0.4015 data_time: 0.0076 memory: 6021 grad_norm: 6.6816 loss: 3.2958 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9234 distill.loss_dist: 2.3724 2023/05/29 19:55:02 - mmengine - INFO - Epoch(train) [100][760/940] lr: 1.0000e-04 eta: 0:01:13 time: 0.4091 data_time: 0.0075 memory: 6021 grad_norm: 6.6640 loss: 3.6503 student.top1_acc: 0.6250 student.top5_acc: 0.8750 student.loss_cls: 1.1211 distill.loss_dist: 2.5292 2023/05/29 19:55:10 - mmengine - INFO - Epoch(train) [100][780/940] lr: 1.0000e-04 eta: 0:01:05 time: 0.4086 data_time: 0.0074 memory: 6021 grad_norm: 6.5427 loss: 3.4866 student.top1_acc: 0.8750 student.top5_acc: 0.8750 student.loss_cls: 1.1649 distill.loss_dist: 2.3217 2023/05/29 19:55:18 - mmengine - INFO - Epoch(train) [100][800/940] lr: 1.0000e-04 eta: 0:00:57 time: 0.4093 data_time: 0.0076 memory: 6021 grad_norm: 6.9517 loss: 3.3781 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9562 distill.loss_dist: 2.4218 2023/05/29 19:55:26 - mmengine - INFO - Epoch(train) [100][820/940] lr: 1.0000e-04 eta: 0:00:49 time: 0.4099 data_time: 0.0078 memory: 6021 grad_norm: 6.7308 loss: 3.5321 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 0.9932 distill.loss_dist: 2.5389 2023/05/29 19:55:34 - mmengine - INFO - Epoch(train) [100][840/940] lr: 1.0000e-04 eta: 0:00:40 time: 0.4014 data_time: 0.0076 memory: 6021 grad_norm: 6.5742 loss: 3.6169 student.top1_acc: 0.8750 student.top5_acc: 1.0000 student.loss_cls: 0.9482 distill.loss_dist: 2.6687 2023/05/29 19:55:42 - mmengine - INFO - Epoch(train) [100][860/940] lr: 1.0000e-04 eta: 0:00:32 time: 0.4094 data_time: 0.0078 memory: 6021 grad_norm: 6.5108 loss: 3.3429 student.top1_acc: 0.3750 student.top5_acc: 0.7500 student.loss_cls: 1.0236 distill.loss_dist: 2.3193 2023/05/29 19:55:50 - mmengine - INFO - Epoch(train) [100][880/940] lr: 1.0000e-04 eta: 0:00:24 time: 0.4063 data_time: 0.0075 memory: 6021 grad_norm: 6.8509 loss: 3.6765 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1648 distill.loss_dist: 2.5117 2023/05/29 19:55:58 - mmengine - INFO - Epoch(train) [100][900/940] lr: 1.0000e-04 eta: 0:00:16 time: 0.3998 data_time: 0.0072 memory: 6021 grad_norm: 6.6916 loss: 3.5516 student.top1_acc: 0.7500 student.top5_acc: 0.8750 student.loss_cls: 1.1698 distill.loss_dist: 2.3818 2023/05/29 19:56:07 - mmengine - INFO - Epoch(train) [100][920/940] lr: 1.0000e-04 eta: 0:00:08 time: 0.4091 data_time: 0.0075 memory: 6021 grad_norm: 6.7991 loss: 3.3925 student.top1_acc: 0.7500 student.top5_acc: 1.0000 student.loss_cls: 0.8621 distill.loss_dist: 2.5304 2023/05/29 19:56:14 - mmengine - INFO - Exp name: tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4_20230529_192809 2023/05/29 19:56:14 - mmengine - INFO - Epoch(train) [100][940/940] lr: 1.0000e-04 eta: 0:00:00 time: 0.3814 data_time: 0.0075 memory: 6021 grad_norm: 6.7489 loss: 3.2966 student.top1_acc: 0.5000 student.top5_acc: 1.0000 student.loss_cls: 1.0397 distill.loss_dist: 2.2569 2023/05/29 19:56:14 - mmengine - INFO - Saving checkpoint at 100 epochs 2023/05/29 19:56:24 - mmengine - INFO - Epoch(val) [100][20/78] eta: 0:00:18 time: 0.3142 data_time: 0.2625 memory: 1444 2023/05/29 19:56:28 - mmengine - INFO - Epoch(val) [100][40/78] eta: 0:00:10 time: 0.2219 data_time: 0.1709 memory: 1444 2023/05/29 19:56:33 - mmengine - INFO - Epoch(val) [100][60/78] eta: 0:00:04 time: 0.2496 data_time: 0.1995 memory: 1444 2023/05/29 19:56:44 - mmengine - INFO - Epoch(val) [100][20/78] eta: 0:01:12 time: 0.3370 data_time: 0.1538 memory: 2227 2023/05/29 19:56:49 - mmengine - INFO - Epoch(val) [100][40/78] eta: 0:00:28 time: 0.2531 data_time: 0.0698 memory: 2227 2023/05/29 19:56:54 - mmengine - INFO - Epoch(val) [100][60/78] eta: 0:00:10 time: 0.2678 data_time: 0.0846 memory: 2227 2023/05/29 19:56:59 - mmengine - INFO - Epoch(val) [100][78/78] acc/top1: 0.7311 acc/top5: 0.9088 acc/mean1: 0.7310 teacher.acc/top1: 0.7727 teacher.acc/top5: 0.9298 teacher.acc/mean1: 0.7726 data_time: 0.0812 time: 0.2610 2023/05/29 19:56:59 - mmengine - INFO - The previous best checkpoint /mnt/data/mmact/lilin/Repos/mmaction2/work_dirs/tsn_razor_dist_swin_r50_1x1x8_k400_dist_weight4/best_acc_top1_epoch_84.pth is removed 2023/05/29 19:57:01 - mmengine - INFO - The best checkpoint with 0.7311 acc/top1 at 100 epoch is saved to best_acc_top1_epoch_100.pth.