API Reference¶

mmaction.apis¶

mmaction.apis.inference_recognizer(model, video_path, label_path, use_frames=False, outputs=None, as_tensor=True)[源代码]¶

Inference a video with the detector.

参数

model (nn.Module) – The loaded recognizer.
video_path (str) – The video file path/url or the rawframes directory path. If use_frames is set to True, it should be rawframes directory path. Otherwise, it should be video file path.
label_path (str) – The label file path.
use_frames (bool) – Whether to use rawframes as input. Default:False.
outputs (list(str) | tuple(str) | str | None) – Names of layers whose outputs need to be returned, default: None.
as_tensor (bool) – Same as that in OutputHook. Default: True.

返回

Top-5 recognition result dict. dict[torch.tensor | np.ndarray]:

Output feature maps from layers specified in outputs.

返回类型

dict[tuple(str, float)]

mmaction.apis.init_recognizer(config, checkpoint=None, device='cuda:0', use_frames=False)[源代码]¶

Initialize a recognizer from config file.

参数

config (str | mmcv.Config) – Config file path or the config object.
checkpoint (str | None, optional) – Checkpoint path/url. If set to None, the model will not load any weights. Default: None.
device (str | torch.device) – The desired device of returned tensor. Default: ‘cuda:0’.
use_frames (bool) – Whether to use rawframes as input. Default:False.

返回

The constructed recognizer.

返回类型

nn.Module

mmaction.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=True)[源代码]¶

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode. Default: None
gpu_collect (bool) – Option to use either gpu or cpu to collect results. Default: True

返回

The prediction results.

返回类型

list

mmaction.apis.single_gpu_test(model, data_loader)[源代码]¶

Test model with a single gpu.

This method tests model with a single gpu and displays test progress bar.

参数

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.

返回

The prediction results.

返回类型

list

mmaction.apis.train_model(model, dataset, cfg, distributed=False, validate=False, test={'test_best': False, 'test_last': False}, timestamp=None, meta=None)[源代码]¶

Train model entry function.

参数

model (nn.Module) – The model to be trained.
dataset (Dataset) – Train dataset.
cfg (dict) – The config dict for training.
distributed (bool) – Whether to use distributed training. Default: False.
validate (bool) – Whether to do evaluation. Default: False.
test (dict) – The testing option, with two keys: test_last & test_best. The value is True or False, indicating whether to test the corresponding checkpoint. Default: dict(test_best=False, test_last=False).
timestamp (str | None) – Local time for runner. Default: None.
meta (dict | None) – Meta dict to record some important information. Default: None

mmaction.core¶

optimizer¶

class mmaction.core.optimizer.CopyOfSGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[源代码]¶

A clone of torch.optim.SGD.

A customized optimizer could be defined like CopyOfSGD. You may derive from built-in optimizers in torch.optim, or directly implement a new optimizer.

class mmaction.core.optimizer.TSMOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[源代码]¶

Optimizer constructor in TSM model.

This constructor builds optimizer in different ways from the default one.

Parameters of the first conv layer have default lr and weight decay.
Parameters of BN layers have default lr and zero weight decay.
If the field “fc_lr5” in paramwise_cfg is set to True, the parameters of the last fc layer in cls_head have 5x lr multiplier and 10x weight decay multiplier.
Weights of other layers have default lr and weight decay, and biases have a 2x lr multiplier and zero weight decay.

add_params(params, model)[源代码]¶

Add parameters and their corresponding lr and wd to the params.

参数

params (list) – The list to be modified, containing all parameter groups and their corresponding lr and wd configurations.
model (nn.Module) – The model to be trained with the optimizer.

evaluation¶

class mmaction.core.evaluation.ActivityNetLocalization(ground_truth_filename=None, prediction_filename=None, tiou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]), verbose=False)[源代码]¶

Class to evaluate detection results on ActivityNet.

参数

ground_truth_filename (str | None) – The filename of groundtruth. Default: None.
prediction_filename (str | None) – The filename of action detection results. Default: None.
tiou_thresholds (np.ndarray) – The thresholds of temporal iou to evaluate. Default: np.linspace(0.5, 0.95, 10).
verbose (bool) – Whether to print verbose logs. Default: False.

evaluate()[源代码]¶

Evaluates a prediction file.

For the detection task we measure the interpolated mean average precision to measure the performance of a method.

wrapper_compute_average_precision()[源代码]¶: Computes average precision for each class.

class mmaction.core.evaluation.DistEvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best='auto', rule=None, broadcast_bn_buffer=True, tmpdir=None, gpu_collect=False, **eval_kwargs)[源代码]¶

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

参数

dataloader (DataLoader) – A PyTorch dataloader.
start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.
interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
save_best (str | None, optional) –
If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be save in best.json. Options are the evaluation metrics to the test dataset. e.g.,

top1_acc, top5_acc, mean_class_accuracy,

mean_average_precision, mmit_mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). mAP@0.5IOU for spatio-temporal action detection dataset (AVADataset). If save_best is auto, the first key of the returned OrderedDict result will be used. Default: ‘auto’.
rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
broadcast_bn_buffer (bool) – Whether to broadcast the buffer(running_mean and running_var) of rank 0 to other rank before evaluation. Default: True.
**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

class mmaction.core.evaluation.EvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best='auto', rule=None, **eval_kwargs)[源代码]¶

Non-Distributed evaluation hook.

提示

If new arguments are added for EvalHook, tools/test.py, tools/eval_metric.py may be effected.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

参数

dataloader (DataLoader) – A PyTorch dataloader.
start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.
interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
save_best (str | None, optional) –
If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be save in best.json. Options are the evaluation metrics to the test dataset. e.g.,

top1_acc, top5_acc, mean_class_accuracy,

mean_average_precision, mmit_mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset. (ActivityNetDataset). mAP@0.5IOU for spatio-temporal action detection dataset (AVADataset). If save_best is auto, the first key of the returned OrderedDict result will be used. Default: ‘auto’.
rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

after_train_epoch(runner)[源代码]¶: Called after every training epoch to evaluate the results.

after_train_iter(runner)[源代码]¶: Called after every training iter to evaluate the results.

before_train_epoch(runner)[源代码]¶: Evaluate the model only at the start of training by epoch.

before_train_iter(runner)[源代码]¶: Evaluate the model only at the start of training by iteration.

evaluate(runner, results)[源代码]¶

Evaluate the results.

参数

runner (mmcv.Runner) – The underlined training runner.
results (list) – Output results.

evaluation_flag(runner)[源代码]¶

Judge whether to perform_evaluation.

返回: The flag indicating whether to perform evaluation.
返回类型: bool

mmaction.core.evaluation.average_precision_at_temporal_iou(ground_truth, prediction, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[源代码]¶

Compute average precision (in detection task) between ground truth and predicted data frames. If multiple predictions match the same predicted segment, only the one with highest score is matched as true positive. This code is greatly inspired by Pascal VOC devkit.

参数

ground_truth (dict) – Dict containing the ground truth instances. Key: ‘video_id’ Value (np.ndarray): 1D array of ‘t-start’ and ‘t-end’.
prediction (np.ndarray) – 2D array containing the information of proposal instances, including ‘video_id’, ‘class_id’, ‘t-start’, ‘t-end’ and ‘score’.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

返回

1D array of average precision score.

返回类型

np.ndarray

mmaction.core.evaluation.average_recall_at_avg_proposals(ground_truth, proposals, total_num_proposals, max_avg_proposals=None, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[源代码]¶

Computes the average recall given an average number (percentile) of proposals per video.

参数

ground_truth (dict) – Dict containing the ground truth instances.
proposals (dict) – Dict containing the proposal instances.
total_num_proposals (int) – Total number of proposals in the proposal dict.
max_avg_proposals (int | None) – Max number of proposals for one video. Default: None.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

返回

(recall, average_recall, proposals_per_video, auc) In recall, recall[i,j] is recall at i-th temporal_iou threshold at the j-th average number (percentile) of average number of proposals per video. The average_recall is recall averaged over a list of temporal_iou threshold (1D array). This is equivalent to recall.mean(axis=0). The proposals_per_video is the average number of proposals per video. The auc is the area under AR@AN curve.

返回类型

tuple([np.ndarray, np.ndarray, np.ndarray, float])

mmaction.core.evaluation.confusion_matrix(y_pred, y_real, normalize=None)[源代码]¶

Compute confusion matrix.

参数

y_pred (list[int] | np.ndarray[int]) – Prediction labels.
y_real (list[int] | np.ndarray[int]) – Ground truth labels.
normalize (str | None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. Options are “true”, “pred”, “all”, None. Default: None.

返回

Confusion matrix.

返回类型

np.ndarray

mmaction.core.evaluation.get_weighted_score(score_list, coeff_list)[源代码]¶

Get weighted score with given scores and coefficients.

Given n predictions by different classifier: [score_1, score_2, …, score_n] (score_list) and their coefficients: [coeff_1, coeff_2, …, coeff_n] (coeff_list), return weighted score: weighted_score = score_1 * coeff_1 + score_2 * coeff_2 + … + score_n * coeff_n

参数

score_list (list[list[np.ndarray]]) – List of list of scores, with shape n(number of predictions) X num_samples X num_classes
coeff_list (list[float]) – List of coefficients, with shape n.

返回

List of weighted scores.

返回类型

list[np.ndarray]

mmaction.core.evaluation.interpolated_precision_recall(precision, recall)[源代码]¶

Interpolated AP - VOCdevkit from VOC 2011.

参数

precision (np.ndarray) – The precision of different thresholds.
recall (np.ndarray) – The recall of different thresholds.

Returns：: float: Average precision score.

mmaction.core.evaluation.mean_average_precision(scores, labels)[源代码]¶

Mean average precision for multi-label recognition.

参数

scores (list[np.ndarray]) – Prediction scores of different classes for each sample.
labels (list[np.ndarray]) – Ground truth many-hot vector for each sample.

返回

The mean average precision.

返回类型

np.float

mmaction.core.evaluation.mean_class_accuracy(scores, labels)[源代码]¶

Calculate mean class accuracy.

参数

scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.

返回

Mean class accuracy.

返回类型

np.ndarray

mmaction.core.evaluation.mmit_mean_average_precision(scores, labels)[源代码]¶

Mean average precision for multi-label recognition. Used for reporting MMIT style mAP on Multi-Moments in Times. The difference is that this method calculates average-precision for each sample and averages them among samples.

参数

scores (list[np.ndarray]) – Prediction scores of different classes for each sample.
labels (list[np.ndarray]) – Ground truth many-hot vector for each sample.

返回

The MMIT style mean average precision.

返回类型

np.float

mmaction.core.evaluation.pairwise_temporal_iou(candidate_segments, target_segments, calculate_overlap_self=False)[源代码]¶

Compute intersection over union between segments.

参数

candidate_segments (np.ndarray) – 1-dim/2-dim array in format [init, end]/[m x 2:=[init, end]].
target_segments (np.ndarray) – 2-dim array in format [n x 2:=[init, end]].
calculate_overlap_self (bool) – Whether to calculate overlap_self (union / candidate_length) or not. Default: False.

返回

1-dim array [n] /: 2-dim array [n x m] with IoU ratio.
t_overlap_self (np.ndarray, optional): 1-dim array [n] /: 2-dim array [n x m] with overlap_self, returns when calculate_overlap_self is True.

返回类型

t_iou (np.ndarray)

mmaction.core.evaluation.softmax(x, dim=1)[源代码]¶: Compute softmax values for each sets of scores in x.

mmaction.core.evaluation.top_k_accuracy(scores, labels, topk=(1))[源代码]¶

Calculate top k accuracy score.

参数

scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.
topk (tuple[int]) – K value for top_k_accuracy. Default: (1, ).

返回

Top k accuracy score for each k.

返回类型

list[float]

lr¶

class mmaction.core.lr.TINLrUpdaterHook(min_lr, **kwargs)[源代码]¶

mmaction.localization¶

localization¶

mmaction.localization.eval_ap(detections, gt_by_cls, iou_range)[源代码]¶

Evaluate average precisions.

参数

detections (dict) – Results of detections.
gt_by_cls (dict) – Information of groudtruth.
iou_range (list) – Ranges of iou.

返回

Average precision values of classes at ious.

返回类型

list

mmaction.localization.generate_bsp_feature(video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=1000, bsp_boundary_ratio=0.2, num_sample_start=8, num_sample_end=8, num_sample_action=16, num_sample_interp=3, tem_results_ext='.csv', pgm_proposal_ext='.csv', result_dict=None)[源代码]¶

Generate Boundary-Sensitive Proposal Feature with given proposals.

参数

video_list (list[int]) – List of video indexs to generate bsp_feature.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’.
tem_results_dir (str) – Directory to load temporal evaluation results.
pgm_proposals_dir (str) – Directory to load proposals.
top_k (int) – Number of proposals to be considered. Default: 1000
bsp_boundary_ratio (float) – Ratio for proposal boundary (start/end). Default: 0.2.
num_sample_start (int) – Num of samples for actionness in start region. Default: 8.
num_sample_end (int) – Num of samples for actionness in end region. Default: 8.
num_sample_action (int) – Num of samples for actionness in center region. Default: 16.
num_sample_interp (int) – Num of samples for interpolation for each sample point. Default: 3.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
pgm_proposal_ext (str) – File extension for proposals. Default: ‘.csv’.
result_dict (dict | None) – The dict to save the results. Default: None.

返回

A dict contains video_name as keys and: bsp_feature as value. If result_dict is not None, save the results to it.

返回类型

bsp_feature_dict (dict)

mmaction.localization.generate_candidate_proposals(video_list, video_infos, tem_results_dir, temporal_scale, peak_threshold, tem_results_ext='.csv', result_dict=None)[源代码]¶

Generate Candidate Proposals with given temporal evalutation results. Each proposal file will contain: ‘tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa’.

参数

video_list (list[int]) – List of video indexs to generate proposals.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’, ‘duration_frame’, ‘duration_second’, ‘feature_frame’, and ‘annotations’.
tem_results_dir (str) – Directory to load temporal evaluation results.
temporal_scale (int) – The number (scale) on temporal axis.
peak_threshold (float) – The threshold for proposal generation.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
result_dict (dict | None) – The dict to save the results. Default: None.

返回

A dict contains video_name as keys and proposal list as value.: If result_dict is not None, save the results to it.

返回类型

dict

mmaction.localization.load_localize_proposal_file(filename)[源代码]¶

Load the proposal file and split it into many parts which contain one video’s information separately.

参数: filename (str) – Path to the proposal file.
返回: List of all videos’ information.
返回类型: list

mmaction.localization.perform_regression(detections)[源代码]¶

Perform regression on detection results.

参数: detections (list) – Detection results before regression.
返回: Detection results after regression.
返回类型: list

mmaction.localization.soft_nms(proposals, alpha, low_threshold, high_threshold, top_k)[源代码]¶

Soft NMS for temporal proposals.

参数

proposals (np.ndarray) – Proposals generated by network.
alpha (float) – Alpha value of Gaussian decaying function.
low_threshold (float) – Low threshold for soft nms.
high_threshold (float) – High threshold for soft nms.
top_k (int) – Top k values to be considered.

返回

The updated proposals.

返回类型

np.ndarray

mmaction.localization.temporal_iop(proposal_min, proposal_max, gt_min, gt_max)[源代码]¶

Compute IoP score between a groundtruth bbox and the proposals.

Compute the IoP which is defined as the overlap ratio with groundtruth proportional to the duration of this proposal.

参数

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

返回

List of intersection over anchor scores.

返回类型

list[float]

mmaction.localization.temporal_iou(proposal_min, proposal_max, gt_min, gt_max)[源代码]¶

Compute IoU score between a groundtruth bbox and the proposals.

参数

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

返回

List of iou scores.

返回类型

list[float]

mmaction.localization.temporal_nms(detections, threshold)[源代码]¶

Parse the video’s information.

参数

detections (list) – Detection results before NMS.
threshold (float) – Threshold of NMS.

返回

Detection results after NMS.

返回类型

list

mmaction.models¶

models¶

class mmaction.models.AudioRecognizer(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

Audio recognizer model framework.

forward(audios, label=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_gradcam(audios)[源代码]¶: Defines the computation performed at every all when using gradcam utils.

forward_test(audios)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

forward_train(audios, labels)[源代码]¶: Defines the computation performed at every call when training.

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

class mmaction.models.AudioTSNHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.4, init_std=0.01, **kwargs)[源代码]¶

Classification head for TSN on audio.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.BBoxHeadAVA(temporal_pool_type='avg', spatial_pool_type='max', in_channels=2048, num_classes=81, dropout_ratio=0, dropout_before_pool=True, topk=(3, 5), multilabel=True)[源代码]¶

Simplest RoI head, with only two fc layers for classification and regression respectively.

参数

temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.
in_channels (int) – The number of input channels. Default: 2048.
num_classes (int) – The number of classes. Default: 81.
dropout_ratio (float) – A float in [0, 1], indicates the dropout_ratio. Default: 0.
dropout_before_pool (bool) – Dropout Feature before spatial temporal pooling. Default: True.
topk (int or tuple[int]) – Parameter for evaluating multilabel accuracy. Default: (3, 5)
multilabel (bool) – Whether used for a multilabel task. Default: True. (Only support multilabel == True now).

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

recall_prec(pred_vec, target_vec)[源代码]¶

参数

pred_vec (tensor[N x C]) – each element is either 0 or 1
target_vec (tensor[N x C]) – each element is either 0 or 1

class mmaction.models.BCELossWithLogits(loss_weight=1.0, class_weight=None)[源代码]¶

Binary Cross Entropy Loss with logits.

参数

loss_weight (float) – Factor scalar multiplied on the loss. Default: 1.0.
class_weight (list[float] | None) – Loss weight for each class. If set as None, use the same weight 1 for all classes. Only applies to CrossEntropyLoss and BCELossWithLogits (should not be set when using other losses). Default: None.

class mmaction.models.BMN(temporal_dim, boundary_ratio, num_samples, num_samples_per_bin, feat_dim, soft_nms_alpha, soft_nms_low_threshold, soft_nms_high_threshold, post_process_top_k, feature_extraction_interval=16, loss_cls={'type': 'BMNLoss'}, hidden_dim_1d=256, hidden_dim_2d=128, hidden_dim_3d=512)[源代码]¶

Boundary Matching Network for temporal action proposal generation.

Please refer BMN: Boundary-Matching Network for Temporal Action Proposal Generation. Code Reference https://github.com/JJBOY/BMN-Boundary-Matching-Network

参数

temporal_dim (int) – Total frames selected for each video.
boundary_ratio (float) – Ratio for determining video boundaries.
num_samples (int) – Number of samples for each proposal.
num_samples_per_bin (int) – Number of bin samples for each sample.
feat_dim (int) – Feature dimension.
soft_nms_alpha (float) – Soft NMS alpha.
soft_nms_low_threshold (float) – Soft NMS low threshold.
soft_nms_high_threshold (float) – Soft NMS high threshold.
post_process_top_k (int) – Top k proposals in post process.
feature_extraction_interval (int) – Interval used in feature extraction. Default: 16.
loss_cls (dict) – Config for building loss. Default: dict(type='BMNLoss').
hidden_dim_1d (int) – Hidden dim for 1d conv. Default: 256.
hidden_dim_2d (int) – Hidden dim for 2d conv. Default: 128.
hidden_dim_3d (int) – Hidden dim for 3d conv. Default: 512.

forward(raw_feature, gt_bbox=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(raw_feature, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(raw_feature, label_confidence, label_start, label_end)[源代码]¶: Define the computation performed at every call when training.

generate_labels(gt_bbox)[源代码]¶: Generate training labels.

class mmaction.models.BMNLoss[源代码]¶

BMN Loss.

From paper https://arxiv.org/abs/1907.09702, code https://github.com/JJBOY/BMN-Boundary-Matching-Network. It will calculate loss for BMN Model. This loss is a weighted sum of

1) temporal evaluation loss based on confidence score of start and end positions. 2) proposal evaluation regression loss based on confidence scores of candidate proposals. 3) proposal evaluation classification loss based on classification results of candidate proposals.

forward(pred_bm, pred_start, pred_end, gt_iou_map, gt_start, gt_end, bm_mask, weight_tem=1.0, weight_pem_reg=10.0, weight_pem_cls=1.0)[源代码]¶

Calculate Boundary Matching Network Loss.

参数

pred_bm (torch.Tensor) – Predicted confidence score for boundary matching map.
pred_start (torch.Tensor) – Predicted confidence score for start.
pred_end (torch.Tensor) – Predicted confidence score for end.
gt_iou_map (torch.Tensor) – Groundtruth score for boundary matching map.
gt_start (torch.Tensor) – Groundtruth temporal_iou score for start.
gt_end (torch.Tensor) – Groundtruth temporal_iou score for end.
bm_mask (torch.Tensor) – Boundary-Matching mask.
weight_tem (float) – Weight for tem loss. Default: 1.0.
weight_pem_reg (float) – Weight for pem regression loss. Default: 10.0.
weight_pem_cls (float) – Weight for pem classification loss. Default: 1.0.

返回

(loss, tem_loss, pem_reg_loss, pem_cls_loss). Loss is the bmn loss, tem_loss is the temporal evaluation loss, pem_reg_loss is the proposal evaluation regression loss, pem_cls_loss is the proposal evaluation classification loss.

返回类型

tuple([torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor])

static pem_cls_loss(pred_score, gt_iou_map, mask, threshold=0.9, ratio_range=(1.05, 21), eps=1e-05)[源代码]¶

Calculate Proposal Evaluation Module Classification Loss.

参数

pred_score (torch.Tensor) – Predicted temporal_iou score by BMN.
gt_iou_map (torch.Tensor) – Groundtruth temporal_iou score.
mask (torch.Tensor) – Boundary-Matching mask.
threshold (float) – Threshold of temporal_iou for positive instances. Default: 0.9.
ratio_range (tuple) – Lower bound and upper bound for ratio. Default: (1.05, 21)
eps (float) – Epsilon for small value. Default: 1e-5

返回

Proposal evalutaion classification loss.

返回类型

torch.Tensor

static pem_reg_loss(pred_score, gt_iou_map, mask, high_temporal_iou_threshold=0.7, low_temporal_iou_threshold=0.3)[源代码]¶

Calculate Proposal Evaluation Module Regression Loss.

参数

pred_score (torch.Tensor) – Predicted temporal_iou score by BMN.
gt_iou_map (torch.Tensor) – Groundtruth temporal_iou score.
mask (torch.Tensor) – Boundary-Matching mask.
high_temporal_iou_threshold (float) – Higher threshold of temporal_iou. Default: 0.7.
low_temporal_iou_threshold (float) – Higher threshold of temporal_iou. Default: 0.3.

返回

Proposal evalutaion regression loss.

返回类型

torch.Tensor

static tem_loss(pred_start, pred_end, gt_start, gt_end)[源代码]¶

Calculate Temporal Evaluation Module Loss.

This function calculate the binary_logistic_regression_loss for start and end respectively and returns the sum of their losses.

参数

pred_start (torch.Tensor) – Predicted start score by BMN model.
pred_end (torch.Tensor) – Predicted end score by BMN model.
gt_start (torch.Tensor) – Groundtruth confidence score for start.
gt_end (torch.Tensor) – Groundtruth confidence score for end.

返回

Returned binary logistic loss.

返回类型

torch.Tensor

class mmaction.models.BaseHead(num_classes, in_channels, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss'}, multi_class=False, label_smooth_eps=0.0)[源代码]¶

Base class for head.

All Head should subclass it. All subclass should overwrite: - Methods:init_weights, initializing weights in some modules. - Methods:forward, supporting to forward both for training and testing.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’, loss_weight=1.0).
multi_class (bool) – Determines whether it is a multi-class recognition task. Default: False.
label_smooth_eps (float) – Epsilon used in label smooth. Reference: arxiv.org/abs/1906.02629. Default: 0.

abstract forward(x)[源代码]¶: Defines the computation performed at every call.

abstract init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

loss(cls_score, labels, **kwargs)[源代码]¶

Calculate the loss given output cls_score, target labels.

参数

cls_score (torch.Tensor) – The output of the model.
labels (torch.Tensor) – The target output of the model.

返回

A dict containing field ‘loss_cls’(mandatory) and ‘top1_acc’, ‘top5_acc’(optional).

返回类型

dict

class mmaction.models.BaseRecognizer(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

Base class for recognizers.

All recognizers should subclass it. All subclass should overwrite:

Methods:forward_train, supporting to forward when training.
Methods:forward_test, supporting to forward when testing.

参数

backbone (dict) – Backbone modules to extract feature.
cls_head (dict) – Classification head to process feature.
train_cfg (dict | None) – Config for training. Default: None.
test_cfg (dict | None) – Config for testing. Default: None.

average_clip(cls_score, num_segs=1)[源代码]¶

Averaging class score over multiple clips.

Using different averaging types (‘score’ or ‘prob’ or None, which defined in test_cfg) to computed the final averaged class score. Only called in test mode.

参数

cls_score (torch.Tensor) – Class score to be averaged.
num_segs (int) – Number of clips for each input sample.

返回

Averaged class score.

返回类型

torch.Tensor

extract_feat(imgs)[源代码]¶

Extract features through a backbone.

参数: imgs (torch.Tensor) – The input images.
返回: The extracted features.
返回类型: torch.tensor

forward(imgs, label=None, return_loss=True, **kwargs)[源代码]¶: Define the computation performed at every call.

abstract forward_gradcam(imgs)[源代码]¶: Defines the computation performed at every all when using gradcam utils.

abstract forward_test(imgs)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

abstract forward_train(imgs, labels, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights()[源代码]¶: Initialize the model network weights.

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

property with_neck¶

whether the detector has a neck

Type: bool

class mmaction.models.BinaryLogisticRegressionLoss[源代码]¶

Binary Logistic Regression Loss.

It will calculate binary logistic regression loss given reg_score and label.

forward(reg_score, label, threshold=0.5, ratio_range=(1.05, 21), eps=1e-05)[源代码]¶

Calculate Binary Logistic Regression Loss.

参数

reg_score (torch.Tensor) – Predicted score by model.
label (torch.Tensor) – Groundtruth labels.
threshold (float) – Threshold for positive instances. Default: 0.5.
ratio_range (tuple) – Lower bound and upper bound for ratio. Default: (1.05, 21)
eps (float) – Epsilon for small value. Default: 1e-5.

返回

Returned binary logistic loss.

返回类型

torch.Tensor

class mmaction.models.C3D(pretrained=None, style='pytorch', conv_cfg=None, norm_cfg=None, act_cfg=None, dropout_ratio=0.5, init_std=0.005)[源代码]¶

C3D backbone.

参数

pretrained (str | None) – Name of pretrained model.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
conv_cfg (dict | None) – Config dict for convolution layer. If set to None, it uses dict(type='Conv3d') to construct layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. required keys are type, Default: None.
act_cfg (dict | None) – Config dict for activation layer. If set to None, it uses dict(type='ReLU') to construct layers. Default: None.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation of fc layers. Default: 0.01.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data. the size of x is (num_batches, 3, 16, 112, 112).
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

class mmaction.models.Conv2plus1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, norm_cfg={'type': 'BN3d'})[源代码]¶

(2+1)d Conv module for R(2+1)d backbone.

https://arxiv.org/pdf/1711.11248.pdf.

参数

in_channels (int) – Same as nn.Conv3d.
out_channels (int) – Same as nn.Conv3d.
kernel_size (int | tuple[int]) – Same as nn.Conv3d.
stride (int | tuple[int]) – Same as nn.Conv3d.
padding (int | tuple[int]) – Same as nn.Conv3d.
dilation (int | tuple[int]) – Same as nn.Conv3d.
groups (int) – Same as nn.Conv3d.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.ConvAudio(in_channels, out_channels, kernel_size, op='concat', stride=1, padding=0, dilation=1, groups=1, bias=False)[源代码]¶

Conv2d module for AudioResNet backbone.

<https://arxiv.org/abs/2001.08740>`_.

参数

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int | tuple[int]) – Same as nn.Conv2d.
op (string) – Operation to merge the output of freq and time feature map. Choices are ‘sum’ and ‘concat’. Default: ‘concat’.
stride (int | tuple[int]) – Same as nn.Conv2d.
padding (int | tuple[int]) – Same as nn.Conv2d.
dilation (int | tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.CrossEntropyLoss(loss_weight=1.0, class_weight=None)[源代码]¶

Cross Entropy Loss.

Support two kinds of labels and their corresponding loss type. It’s worth mentioning that loss type will be detected by the shape of cls_score and label. 1) Hard label: This label is an integer array and all of the elements are

in the range [0, num_classes - 1]. This label’s shape should be cls_score’s shape with the num_classes dimension removed.

Soft label(probablity distribution over classes): This label is a
probability distribution and all of the elements are in the range [0, 1]. This label’s shape must be the same as cls_score. For now, only 2-dim soft label is supported.

参数

loss_weight (float) – Factor scalar multiplied on the loss. Default: 1.0.
class_weight (list[float] | None) – Loss weight for each class. If set as None, use the same weight 1 for all classes. Only applies to CrossEntropyLoss and BCELossWithLogits (should not be set when using other losses). Default: None.

class mmaction.models.FBOHead(lfb_cfg, fbo_cfg, temporal_pool_type='avg', spatial_pool_type='max')[源代码]¶

Feature Bank Operator Head.

Add feature bank operator for the spatiotemporal detection model to fuse short-term features and long-term features.

参数

lfb_cfg (Dict) – The config dict for LFB which is used to sample long-term features.
fbo_cfg (Dict) – The config dict for feature bank operator (FBO). The type of fbo is also in the config dict and supported fbo type is fbo_dict.
temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.

forward(x, rois, img_metas)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in the module.

参数: pretrained (str, optional) – Path to pre-trained weights. Default: None.

sample_lfb(rois, img_metas)[源代码]¶: Sample long-term features for each ROI feature.

class mmaction.models.HVULoss(categories=('action', 'attribute', 'concept', 'event', 'object', 'scene'), category_nums=(739, 117, 291, 69, 1678, 248), category_loss_weights=(1, 1, 1, 1, 1, 1), loss_type='all', with_mask=False, reduction='mean', loss_weight=1.0)[源代码]¶

Calculate the BCELoss for HVU.

参数

categories (tuple[str]) – Names of tag categories, tags are organized in this order. Default: [‘action’, ‘attribute’, ‘concept’, ‘event’, ‘object’, ‘scene’].
category_nums (tuple[int]) – Number of tags for each category. Default: (739, 117, 291, 69, 1678, 248).
category_loss_weights (tuple[float]) – Loss weights of categories, it applies only if loss_type == ‘individual’. The loss weights will be normalized so that the sum equals to 1, so that you can give any positive number as loss weight. Default: (1, 1, 1, 1, 1, 1).
loss_type (str) – The loss type we calculate, we can either calculate the BCELoss for all tags, or calculate the BCELoss for tags in each category. Choices are ‘individual’ or ‘all’. Default: ‘all’.
with_mask (bool) – Since some tag categories are missing for some video clips. If with_mask == True, we will not calculate loss for these missing categories. Otherwise, these missing categories are treated as negative samples.
reduction (str) – Reduction way. Choices are ‘mean’ or ‘sum’. Default: ‘mean’.
loss_weight (float) – The loss weight. Default: 1.0.

class mmaction.models.I3DHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.5, init_std=0.01, **kwargs)[源代码]¶

Classification head for I3D.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.LFB(lfb_prefix_path, max_num_sampled_feat=5, window_size=60, lfb_channels=2048, dataset_modes=('train', 'val'), device='gpu', lmdb_map_size=4000000000.0, construct_lmdb=True)[源代码]¶

Long-Term Feature Bank (LFB).

LFB is proposed in Long-Term Feature Banks for Detailed Video Understanding

The ROI features of videos are stored in the feature bank. The feature bank was generated by inferring with a lfb infer config.

Formally, LFB is a Dict whose keys are video IDs and its values are also Dicts whose keys are timestamps in seconds. Example of LFB:

参数

lfb_prefix_path (str) – The storage path of lfb.
max_num_sampled_feat (int) – The max number of sampled features. Default: 5.
window_size (int) – Window size of sampling long term feature. Default: 60.
lfb_channels (int) – Number of the channels of the features stored in LFB. Default: 2048.
dataset_modes (tuple[str] | str) – Load LFB of datasets with different modes, such as training, validation, testing datasets. If you don’t do cross validation during training, just load the training dataset i.e. setting dataset_modes = (‘train’). Default: (‘train’, ‘val’).
device (str) – Where to load lfb. Choices are ‘gpu’, ‘cpu’ and ‘lmdb’. A 1.65GB half-precision ava lfb (including training and validation) occupies about 2GB GPU memory. Default: ‘gpu’.
lmdb_map_size (int) – Map size of lmdb. Default: 4e9.
construct_lmdb (bool) – Whether to construct lmdb. If you have constructed lmdb of lfb, you can set to False to skip the construction. Default: True.

class mmaction.models.LFBInferHead(lfb_prefix_path, dataset_mode='train', use_half_precision=True, temporal_pool_type='avg', spatial_pool_type='max')[源代码]¶

Long-Term Feature Bank Infer Head.

This head is used to derive and save the LFB without affecting the input.

参数

lfb_prefix_path (str) – The prefix path to store the lfb.
dataset_mode (str, optional) – Which dataset to be inferred. Choices are ‘train’, ‘val’ or ‘test’. Default: ‘train’.
use_half_precision (bool, optional) – Whether to store the half-precision roi features. Default: True.
temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.

forward(x, rois, img_metas)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmaction.models.MobileNetV2(pretrained=None, widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU6'}, norm_eval=False, with_cp=False)[源代码]¶

MobileNetV2 backbone.

参数

pretrained (str | None) – Name of pretrained model. Default: None.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数

out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmaction.models.MobileNetV2TSM(num_segments=8, is_shift=True, shift_div=8, **kwargs)[源代码]¶

MobileNetV2 backbone for TSM.

参数

num_segments (int) – Number of frame segments. Default: 8.
is_shift (bool) – Whether to make temporal shift in reset layers. Default: True.
shift_div (int) – Number of div for shift. Default: 8.
**kwargs (keyword arguments, optional) – Arguments for MobilNetV2.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_shift()[源代码]¶: Make temporal shift for some layers.

class mmaction.models.NLLLoss(loss_weight=1.0)[源代码]¶

NLL Loss.

It will calculate NLL loss given cls_score and label.

class mmaction.models.OHEMHingeLoss[源代码]¶

This class is the core implementation for the completeness loss in paper.

It compute class-wise hinge loss and performs online hard example mining (OHEM).

static backward(ctx, grad_output)[源代码]¶

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs did forward() return, and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, pred, labels, is_positive, ohem_ratio, group_size)[源代码]¶

Calculate OHEM hinge loss.

参数

pred (torch.Tensor) – Predicted completeness score.
labels (torch.Tensor) – Groundtruth class label.
is_positive (int) – Set to 1 when proposals are positive and set to -1 when proposals are incomplete.
ohem_ratio (float) – Ratio of hard examples.
group_size (int) – Number of proposals sampled per video.

返回

Returned class-wise hinge loss.

返回类型

torch.Tensor

class mmaction.models.PEM(pem_feat_dim, pem_hidden_dim, pem_u_ratio_m, pem_u_ratio_l, pem_high_temporal_iou_threshold, pem_low_temporal_iou_threshold, soft_nms_alpha, soft_nms_low_threshold, soft_nms_high_threshold, post_process_top_k, feature_extraction_interval=16, fc1_ratio=0.1, fc2_ratio=0.1, output_dim=1)[源代码]¶

Proposals Evaluation Model for Boundary Sensetive Network.

Please refer BSN: Boundary Sensitive Network for Temporal Action Proposal Generation.

Code reference https://github.com/wzmsltw/BSN-boundary-sensitive-network

参数

pem_feat_dim (int) – Feature dimension.
pem_hidden_dim (int) – Hidden layer dimension.
pem_u_ratio_m (float) – Ratio for medium score proprosals to balance data.
pem_u_ratio_l (float) – Ratio for low score proprosals to balance data.
pem_high_temporal_iou_threshold (float) – High IoU threshold.
pem_low_temporal_iou_threshold (float) – Low IoU threshold.
soft_nms_alpha (float) – Soft NMS alpha.
soft_nms_low_threshold (float) – Soft NMS low threshold.
soft_nms_high_threshold (float) – Soft NMS high threshold.
post_process_top_k (int) – Top k proposals in post process.
feature_extraction_interval (int) – Interval used in feature extraction. Default: 16.
fc1_ratio (float) – Ratio for fc1 layer output. Default: 0.1.
fc2_ratio (float) – Ratio for fc2 layer output. Default: 0.1.
output_dim (int) – Output dimension. Default: 1.

forward(bsp_feature, reference_temporal_iou=None, tmin=None, tmax=None, tmin_score=None, tmax_score=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(bsp_feature, tmin, tmax, tmin_score, tmax_score, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(bsp_feature, reference_temporal_iou)[源代码]¶: Define the computation performed at every call when training.

class mmaction.models.ResNet(depth, pretrained=None, torchvision_pretrain=True, in_channels=3, num_stages=4, out_indices=(3), strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), style='pytorch', frozen_stages=- 1, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, partial_bn=False, with_cp=False)[源代码]¶

ResNet backbone.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model. Default: None.
in_channels (int) – Channel num of input features. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
out_indices (Sequence[int]) – Indices of output feature. Default: (3, ).
dilations (Sequence[int]) – Dilation of each stage.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: pytorch.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict) – Config for norm layers. Default: dict(type=’Conv’).
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN2d’, requires_grad=True).
act_cfg (dict) – Config for activate layers. Default: dict(type=’ReLU’, inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
partial_bn (bool) – Whether to use partial bn. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.ResNet2Plus1d(*args, **kwargs)[源代码]¶

ResNet (2+1)d backbone.

This model is proposed in A Closer Look at Spatiotemporal Convolutions for Action Recognition

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

class mmaction.models.ResNet3d(depth, pretrained, pretrained2d=True, in_channels=3, num_stages=4, base_channels=64, out_indices=(3), spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), conv1_kernel=(5, 7, 7), conv1_stride_t=2, pool1_stride_t=2, with_pool2=True, style='pytorch', frozen_stages=- 1, inflate=(1, 1, 1, 1), inflate_style='3x1x1', conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, non_local=(0, 0, 0, 0), non_local_cfg={}, zero_init_residual=True, **kwargs)[源代码]¶

ResNet 3d backbone.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
pretrained2d (bool) – Whether to load pretrained 2D model. Default: True.
in_channels (int) – Channel num of input features. Default: 3.
base_channels (int) – Channel num of stem output features. Default: 64.
out_indices (Sequence[int]) – Indices of output feature. Default: (3, ).
num_stages (int) – Resnet stages. Default: 4.
spatial_strides (Sequence[int]) – Spatial strides of residual blocks of each stage. Default: (1, 2, 2, 2).
temporal_strides (Sequence[int]) – Temporal strides of residual blocks of each stage. Default: (1, 1, 1, 1).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
conv1_kernel (Sequence[int]) – Kernel size of the first conv layer. Default: (5, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 2.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 2.
with_pool2 (bool) – Whether to use pool2. Default: True.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
inflate (Sequence[int]) – Inflate Dims of each block. Default: (1, 1, 1, 1).
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
non_local (Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: (0, 0, 0, 0).
non_local_cfg (dict) – Config for non-local module. Default: dict().
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

static make_res_layer(block, inplanes, planes, blocks, spatial_stride=1, temporal_stride=1, dilation=1, style='pytorch', inflate=1, inflate_style='3x1x1', non_local=0, non_local_cfg={}, norm_cfg=None, act_cfg=None, conv_cfg=None, with_cp=False, **kwargs)[源代码]¶

Build residual layer for ResNet3D.

参数

block (nn.Module) – Residual module to be built.
inplanes (int) – Number of channels for the input feature in each block.
planes (int) – Number of channels for the output feature in each block.
blocks (int) – Number of residual blocks.
spatial_stride (int | Sequence[int]) – Spatial strides in residual and conv layers. Default: 1.
temporal_stride (int | Sequence[int]) – Temporal strides in residual and conv layers. Default: 1.
dilation (int) – Spacing between kernel elements. Default: 1.
style (str) – pytorch or caffe. If set to pytorch, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: pytorch.
inflate (int | Sequence[int]) – Determine whether to inflate for each block. Default: 1.
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
non_local (int | Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: 0.
non_local_cfg (dict) – Config for non-local module. Default: dict().
conv_cfg (dict | None) – Config for norm layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. Default: None.
act_cfg (dict | None) – Config for activate layers. Default: None.
with_cp (bool | None) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

返回类型

nn.Module

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.ResNet3dCSN(depth, pretrained, temporal_strides=(1, 2, 2, 2), conv1_kernel=(3, 7, 7), conv1_stride_t=1, pool1_stride_t=1, norm_cfg={'eps': 0.001, 'requires_grad': True, 'type': 'BN3d'}, inflate_style='3x3x3', bottleneck_mode='ir', bn_frozen=False, **kwargs)[源代码]¶

ResNet backbone for CSN.

参数

depth (int) – Depth of ResNetCSN, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
temporal_strides (tuple[int]) – Temporal strides of residual blocks of each stage. Default: (1, 2, 2, 2).
conv1_kernel (tuple[int]) – Kernel size of the first conv layer. Default: (3, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 1.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 1.
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN3d’, requires_grad=True, eps=1e-3).
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x3x3’.
bottleneck_mode (str) –
Determine which ways to factorize a 3D bottleneck block using channel-separated convolutional networks.

If set to ‘ip’, it will replace the 3x3x3 conv2 layer with a 1x1x1 traditional convolution and a 3x3x3 depthwise convolution, i.e., Interaction-preserved channel-separated bottleneck block. If set to ‘ir’, it will replace the 3x3x3 conv2 layer with a 3x3x3 depthwise convolution, which is derived from preserved bottleneck block by removing the extra 1x1x1 convolution, i.e., Interaction-reduced channel-separated bottleneck block.

Default: ‘ip’.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.ResNet3dLayer(depth, pretrained, pretrained2d=True, stage=3, base_channels=64, spatial_stride=2, temporal_stride=1, dilation=1, style='pytorch', all_frozen=False, inflate=1, inflate_style='3x1x1', conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, zero_init_residual=True, **kwargs)[源代码]¶

ResNet 3d Layer.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
pretrained2d (bool) – Whether to load pretrained 2D model. Default: True.
stage (int) – The index of Resnet stage. Default: 3.
base_channels (int) – Channel num of stem output features. Default: 64.
spatial_stride (int) – The 1st res block’s spatial stride. Default 2.
temporal_stride (int) – The 1st res block’s temporal stride. Default 1.
dilation (int) – The dilation. Default: 1.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
all_frozen (bool) – Frozen all modules in the layer. Default: False.
inflate (int) – Inflate Dims of each block. Default: 1.
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.ResNet3dSlowFast(pretrained, resample_rate=8, speed_ratio=8, channel_ratio=8, slow_pathway={'conv1_kernel': (1, 7, 7), 'conv1_stride_t': 1, 'depth': 50, 'dilations': (1, 1, 1, 1), 'inflate': (0, 0, 1, 1), 'lateral': True, 'pool1_stride_t': 1, 'pretrained': None, 'type': 'resnet3d'}, fast_pathway={'base_channels': 8, 'conv1_kernel': (5, 7, 7), 'conv1_stride_t': 1, 'depth': 50, 'lateral': False, 'pool1_stride_t': 1, 'pretrained': None, 'type': 'resnet3d'})[源代码]¶

Slowfast backbone.

This module is proposed in SlowFast Networks for Video Recognition

参数

pretrained (str) – The file path to a pretrained model.
resample_rate (int) – A large temporal stride resample_rate on input frames. The actual resample rate is calculated by multipling the interval in SampleFrames in the pipeline with resample_rate, equivalent to the \(\tau\) in the paper, i.e. it processes only one out of resample_rate * interval frames. Default: 8.
speed_ratio (int) – Speed ratio indicating the ratio between time dimension of the fast and slow pathway, corresponding to the \(\alpha\) in the paper. Default: 8.
channel_ratio (int) – Reduce the channel number of fast pathway by channel_ratio, corresponding to \(\beta\) in the paper. Default: 8.
slow_pathway (dict) –
Configuration of slow branch, should contain necessary arguments for building the specific type of pathway and: type (str): type of backbone the pathway bases on. lateral (bool): determine whether to build lateral connection for the pathway.Default:
```
dict(type='ResNetPathway',
lateral=True, depth=50, pretrained=None,
conv1_kernel=(1, 7, 7), dilations=(1, 1, 1, 1),
conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1))
```

fast_pathway (dict) –

Configuration of fast branch, similar to slow_pathway. Default:

dict(type='ResNetPathway',
lateral=False, depth=50, pretrained=None, base_channels=8,
conv1_kernel=(5, 7, 7), conv1_stride_t=1, pool1_stride_t=1)

forward(x)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.

返回

The feature of the input samples extracted: by the backbone.

返回类型

tuple[torch.Tensor]

init_weights(pretrained=None)[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

class mmaction.models.ResNet3dSlowOnly(*args, lateral=False, conv1_kernel=(1, 7, 7), conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1), with_pool2=False, **kwargs)[源代码]¶

SlowOnly backbone based on ResNet3dPathway.

参数

*args (arguments) – Arguments same as ResNet3dPathway.
conv1_kernel (Sequence[int]) – Kernel size of the first conv layer. Default: (1, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 1.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 1.
inflate (Sequence[int]) – Inflate Dims of each block. Default: (0, 0, 1, 1).
**kwargs (keyword arguments) – Keywords arguments for ResNet3dPathway.

class mmaction.models.ResNetAudio(depth, pretrained, in_channels=1, num_stages=4, base_channels=32, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), conv1_kernel=9, conv1_stride=1, frozen_stages=- 1, factorize=(1, 1, 0, 0), norm_eval=False, with_cp=False, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, zero_init_residual=True)[源代码]¶

ResNet 2d audio backbone. Reference:

<https://arxiv.org/abs/2001.08740>`_.

参数

depth (int) – Depth of resnet, from {50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
in_channels (int) – Channel num of input features. Default: 1.
base_channels (int) – Channel num of stem output features. Default: 32.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of residual blocks of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
conv1_kernel (int) – Kernel size of the first conv layer. Default: 9.
conv1_stride (int | tuple[int]) – Stride of the first conv layer. Default: 1.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
factorize (Sequence[int]) – factorize Dims of each block for audio. Default: (1, 1, 0, 0).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict) – Config for norm layers. Default: dict(type=’Conv’).
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN2d’, requires_grad=True).
act_cfg (dict) – Config for activate layers. Default: dict(type=’ReLU’, inplace=True).
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, factorize=1, norm_cfg=None, with_cp=False)[源代码]¶

Build residual layer for ResNetAudio.

参数

block (nn.Module) – Residual module to be built.
inplanes (int) – Number of channels for the input feature in each block.
planes (int) – Number of channels for the output feature in each block.
blocks (int) – Number of residual blocks.
strides (Sequence[int]) – Strides of residual blocks of each stage. Default: (1, 2, 2, 2).
dilation (int) – Spacing between kernel elements. Default: 1.
factorize (int | Sequence[int]) – Determine whether to factorize for each block. Default: 1.
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: None.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.ResNetTIN(depth, num_segments=8, is_tin=True, shift_div=4, **kwargs)[源代码]¶

ResNet backbone for TIN.

参数

depth (int) – Depth of ResNet, from {18, 34, 50, 101, 152}.
num_segments (int) – Number of frame segments. Default: 8.
is_tin (bool) – Whether to apply temporal interlace. Default: True.
shift_div (int) – Number of division parts for shift. Default: 4.
kwargs (dict, optional) – Arguments for ResNet.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_interlace()[源代码]¶: Make temporal interlace for some layers.

class mmaction.models.ResNetTSM(depth, num_segments=8, is_shift=True, non_local=(0, 0, 0, 0), non_local_cfg={}, shift_div=8, shift_place='blockres', temporal_pool=False, **kwargs)[源代码]¶

ResNet backbone for TSM.

参数

num_segments (int) – Number of frame segments. Default: 8.
is_shift (bool) – Whether to make temporal shift in reset layers. Default: True.
non_local (Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: (0, 0, 0, 0).
non_local_cfg (dict) – Config for non-local module. Default: dict().
shift_div (int) – Number of div for shift. Default: 8.
shift_place (str) – Places in resnet layers for shift, which is chosen from [‘block’, ‘blockres’]. If set to ‘block’, it will apply temporal shift to all child blocks in each resnet layer. If set to ‘blockres’, it will apply temporal shift to each conv1 layer of all child blocks in each resnet layer. Default: ‘blockres’.
temporal_pool (bool) – Whether to add temporal pooling. Default: False.
**kwargs (keyword arguments, optional) – Arguments for ResNet.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_pool()[源代码]¶: Make temporal pooling between layer1 and layer2, using a 3D max pooling layer.

make_temporal_shift()[源代码]¶: Make temporal shift for some layers.

class mmaction.models.SSNLoss[源代码]¶

static activity_loss(activity_score, labels, activity_indexer)[源代码]¶

Activity Loss.

It will calculate activity loss given activity_score and label.

Args：: activity_score (torch.Tensor): Predicted activity score. labels (torch.Tensor): Groundtruth class label. activity_indexer (torch.Tensor): Index slices of proposals.

返回: Returned cross entropy loss.
返回类型: torch.Tensor

static classwise_regression_loss(bbox_pred, labels, bbox_targets, regression_indexer)[源代码]¶

Classwise Regression Loss.

It will calculate classwise_regression loss given class_reg_pred and targets.

Args：

bbox_pred (torch.Tensor): Predicted interval center and span: of positive proposals.

labels (torch.Tensor): Groundtruth class label. bbox_targets (torch.Tensor): Groundtruth center and span

of positive proposals.

regression_indexer (torch.Tensor): Index slices of: positive proposals.

返回: Returned class-wise regression loss.
返回类型: torch.Tensor

static completeness_loss(completeness_score, labels, completeness_indexer, positive_per_video, incomplete_per_video, ohem_ratio=0.17)[源代码]¶

Completeness Loss.

It will calculate completeness loss given completeness_score and label.

Args：

completeness_score (torch.Tensor): Predicted completeness score. labels (torch.Tensor): Groundtruth class label. completeness_indexer (torch.Tensor): Index slices of positive and

incomplete proposals.

positive_per_video (int): Number of positive proposals sampled: per video.
incomplete_per_video (int): Number of incomplete proposals sampled: pre video.
ohem_ratio (float): Ratio of online hard example mining.: Default: 0.17.

返回: Returned class-wise completeness loss.
返回类型: torch.Tensor

forward(activity_score, completeness_score, bbox_pred, proposal_type, labels, bbox_targets, train_cfg)[源代码]¶

Calculate Boundary Matching Network Loss.

参数

activity_score (torch.Tensor) – Predicted activity score.
completeness_score (torch.Tensor) – Predicted completeness score.
bbox_pred (torch.Tensor) – Predicted interval center and span of positive proposals.
proposal_type (torch.Tensor) – Type index slices of proposals.
labels (torch.Tensor) – Groundtruth class label.
bbox_targets (torch.Tensor) – Groundtruth center and span of positive proposals.
train_cfg (dict) – Config for training.

返回

(loss_activity, loss_completeness, loss_reg). Loss_activity is the activity loss, loss_completeness is the class-wise completeness loss, loss_reg is the class-wise regression loss.

返回类型

dict([torch.Tensor, torch.Tensor, torch.Tensor])

class mmaction.models.SingleRoIExtractor3D(roi_layer_type='RoIAlign', featmap_stride=16, output_size=16, sampling_ratio=0, pool_mode='avg', aligned=True, with_temporal_pool=True, with_global=False)[源代码]¶

Extract RoI features from a single level feature map.

参数

roi_layer_type (str) – Specify the RoI layer type. Default: ‘RoIAlign’.
featmap_stride (int) – Strides of input feature maps. Default: 16.
output_size (int | tuple) – Size or (Height, Width). Default: 16.
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models. Default: 0.
pool_mode (str, 'avg' or 'max') – pooling mode in each bin. Default: ‘avg’.
aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly. Default: True.
with_temporal_pool (bool) – if True, avgpool the temporal dim. Default: True.
with_global (bool) – if True, concatenate the RoI feature with global feature. Default: False.

Note that sampling_ratio, pool_mode, aligned only apply when roi_layer_type is set as RoIAlign.

forward(feat, rois)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmaction.models.SlowFastHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.8, init_std=0.01, **kwargs)[源代码]¶

The classification head for SlowFast.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.8.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.TAM(in_channels, num_segments, alpha=2, adaptive_kernel_size=3, beta=4, conv1d_kernel_size=3, adaptive_convolution_stride=1, adaptive_convolution_padding=1, init_std=0.001)[源代码]¶

Temporal Adaptive Module(TAM) for TANet.

This module is proposed in TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO RECOGNITION

参数

in_channels (int) – Channel num of input features.
num_segments (int) – Number of frame segments.
alpha (int) – `alpha` in the paper and is the ratio of the intermediate channel number to the initial channel number in the global branch. Default: 2.
adaptive_kernel_size (int) – `K` in the paper and is the size of the adaptive kernel size in the global branch. Default: 3.
beta (int) – `beta` in the paper and is set to control the model complexity in the local branch. Default: 4.
conv1d_kernel_size (int) – Size of the convolution kernel of Conv1d in the local branch. Default: 3.
adaptive_convolution_stride (int) – The first dimension of strides in the adaptive convolution of `Temporal Adaptive Aggregation`. Default: 1.
adaptive_convolution_padding (int) – The first dimension of paddings in the adaptive convolution of `Temporal Adaptive Aggregation`. Default: 1.
init_std (float) – Std value for initiation of nn.Linear. Default: 0.001.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.TANet(depth, num_segments, tam_cfg={}, **kwargs)[源代码]¶

Temporal Adaptive Network (TANet) backbone.

This backbone is proposed in TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO RECOGNITION

Embedding the temporal adaptive module (TAM) into ResNet to instantiate TANet.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
num_segments (int) – Number of frame segments.
tam_cfg (dict | None) – Config for temporal adaptive module (TAM). Default: dict().
**kwargs (keyword arguments, optional) – Arguments for ResNet except `depth`.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_tam_modeling()[源代码]¶: Replace ResNet-Block with TA-Block.

class mmaction.models.TEM(temporal_dim, boundary_ratio, tem_feat_dim, tem_hidden_dim, tem_match_threshold, loss_cls={'type': 'BinaryLogisticRegressionLoss'}, loss_weight=2, output_dim=3, conv1_ratio=1, conv2_ratio=1, conv3_ratio=0.01)[源代码]¶

Temporal Evaluation Model for Boundary Sensetive Network.

Please refer BSN: Boundary Sensitive Network for Temporal Action Proposal Generation.

Code reference https://github.com/wzmsltw/BSN-boundary-sensitive-network

参数

tem_feat_dim (int) – Feature dimension.
tem_hidden_dim (int) – Hidden layer dimension.
tem_match_threshold (float) – Temporal evaluation match threshold.
loss_cls (dict) – Config for building loss. Default: dict(type='BinaryLogisticRegressionLoss').
loss_weight (float) – Weight term for action_loss. Default: 2.
output_dim (int) – Output dimension. Default: 3.
conv1_ratio (float) – Ratio of conv1 layer output. Default: 1.0.
conv2_ratio (float) – Ratio of conv2 layer output. Default: 1.0.
conv3_ratio (float) – Ratio of conv3 layer output. Default: 0.01.

forward(raw_feature, gt_bbox=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(raw_feature, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(raw_feature, label_action, label_start, label_end)[源代码]¶: Define the computation performed at every call when training.

generate_labels(gt_bbox)[源代码]¶: Generate training labels.

class mmaction.models.TPN(in_channels, out_channels, spatial_modulation_cfg=None, temporal_modulation_cfg=None, upsample_cfg=None, downsample_cfg=None, level_fusion_cfg=None, aux_head_cfg=None, flow_type='cascade')[源代码]¶

TPN neck.

This module is proposed in Temporal Pyramid Network for Action Recognition

参数

in_channels (tuple[int]) – Channel numbers of input features tuple.
out_channels (int) – Channel number of output feature.
spatial_modulation_cfg (dict | None) – Config for spatial modulation layers. Required keys are in_channels and out_channels. Default: None.
temporal_modulation_cfg (dict | None) – Config for temporal modulation layers. Default: None.
upsample_cfg (dict | None) – Config for upsample layers. The keys are same as that in :class:nn.Upsample. Default: None.
downsample_cfg (dict | None) – Config for downsample layers. Default: None.
level_fusion_cfg (dict | None) – Config for level fusion layers. Required keys are ‘in_channels’, ‘mid_channels’, ‘out_channels’. Default: None.
aux_head_cfg (dict | None) – Config for aux head layers. Required keys are ‘out_channels’. Default: None.
flow_type (str) – Flow type to combine the features. Options are ‘cascade’ and ‘parallel’. Default: ‘cascade’.

forward(x, target=None)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmaction.models.TPNHead(*args, **kwargs)[源代码]¶

Class head for TPN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
multi_class (bool) – Determines whether it is a multi-class recognition task. Default: False.
label_smooth_eps (float) – Epsilon used in label smooth. Reference: https://arxiv.org/abs/1906.02629. Default: 0.

forward(x, num_segs=None, fcn_test=False)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int | None) – Number of segments into which a video is divided. Default: None.
fcn_test (bool) – Whether to apply full convolution (fcn) testing. Default: False.

返回

The classification scores for input samples.

返回类型

torch.Tensor

class mmaction.models.TRNHead(num_classes, in_channels, num_segments=8, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', relation_type='TRNMultiScale', hidden_dim=256, dropout_ratio=0.8, init_std=0.001, **kwargs)[源代码]¶

Class head for TRN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
num_segments (int) – Number of frame segments. Default: 8.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
relation_type (str) – The relation module type. Choices are ‘TRN’ or ‘TRNMultiScale’. Default: ‘TRNMultiScale’.
hidden_dim (int) – The dimension of hidden layer of MLP in relation module. Default: 256.
dropout_ratio (float) – Probability of dropout layer. Default: 0.8.
init_std (float) – Std value for Initiation. Default: 0.001.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Useless in TRNHead. By default, num_segs is equal to clip_len * num_clips * num_crops, which is automatically generated in Recognizer forward phase and useless in TRN models. The self.num_segments we need is a hyper parameter to build TRN models.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.TSMHead(num_classes, in_channels, num_segments=8, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', consensus={'dim': 1, 'type': 'AvgConsensus'}, dropout_ratio=0.8, init_std=0.001, is_shift=True, temporal_pool=False, **kwargs)[源代码]¶

Class head for TSM.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
num_segments (int) – Number of frame segments. Default: 8.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
is_shift (bool) – Indicating whether the feature is shifted. Default: True.
temporal_pool (bool) – Indicating whether feature is temporal pooled. Default: False.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Useless in TSMHead. By default, num_segs is equal to clip_len * num_clips * num_crops, which is automatically generated in Recognizer forward phase and useless in TSM models. The self.num_segments we need is a hyper parameter to build TSM models.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.TSNHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', consensus={'dim': 1, 'type': 'AvgConsensus'}, dropout_ratio=0.4, init_std=0.01, **kwargs)[源代码]¶

Class head for TSN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Number of segments into which a video is divided.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.X3D(gamma_w=1.0, gamma_b=1.0, gamma_d=1.0, pretrained=None, in_channels=3, num_stages=4, spatial_strides=(2, 2, 2, 2), frozen_stages=- 1, se_style='half', se_ratio=0.0625, use_swish=True, conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, zero_init_residual=True, **kwargs)[源代码]¶

X3D backbone. https://arxiv.org/pdf/2004.04730.pdf.

参数

gamma_w (float) – Global channel width expansion factor. Default: 1.
gamma_b (float) – Bottleneck channel width expansion factor. Default: 1.
gamma_d (float) – Network depth expansion factor. Default: 1.
pretrained (str | None) – Name of pretrained model. Default: None.
in_channels (int) – Channel num of input features. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
spatial_strides (Sequence[int]) – Spatial strides of residual blocks of each stage. Default: (1, 2, 2, 2).
frozen_stages (int) – Stages to be frozen (all param fixed). If set to -1, it means not freezing any parameters. Default: -1.
se_style (str) – The style of inserting SE modules into BlockX3D, ‘half’ denotes insert into half of the blocks, while ‘all’ denotes insert into all blocks. Default: ‘half’.
se_ratio (float | None) – The reduction ratio of squeeze and excitation unit. If set as None, it means not using SE unit. Default: 1 / 16.
use_swish (bool) – Whether to use swish as the activation function before and after the 3x3x3 conv. Default: True.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_res_layer(block, layer_inplanes, inplanes, planes, blocks, spatial_stride=1, se_style='half', se_ratio=None, use_swish=True, norm_cfg=None, act_cfg=None, conv_cfg=None, with_cp=False, **kwargs)[源代码]¶

Build residual layer for ResNet3D.

参数

block (nn.Module) – Residual module to be built.
layer_inplanes (int) – Number of channels for the input feature of the res layer.
inplanes (int) – Number of channels for the input feature in each block, which equals to base_channels * gamma_w.
planes (int) – Number of channels for the output feature in each block, which equals to base_channel * gamma_w * gamma_b.
blocks (int) – Number of residual blocks.
spatial_stride (int) – Spatial strides in residual and conv layers. Default: 1.
se_style (str) – The style of inserting SE modules into BlockX3D, ‘half’ denotes insert into half of the blocks, while ‘all’ denotes insert into all blocks. Default: ‘half’.
se_ratio (float | None) – The reduction ratio of squeeze and excitation unit. If set as None, it means not using SE unit. Default: None.
use_swish (bool) – Whether to use swish as the activation function before and after the 3x3x3 conv. Default: True.
conv_cfg (dict | None) – Config for norm layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. Default: None.
act_cfg (dict | None) – Config for activate layers. Default: None.
with_cp (bool | None) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

返回类型

nn.Module

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.X3DHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.5, init_std=0.01, fc1_bias=False)[源代码]¶

Classification head for I3D.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation. Default: 0.01.
fc1_bias (bool) – If the first fc layer has bias. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

mmaction.models.build_backbone(cfg)[源代码]¶: Build backbone.

mmaction.models.build_head(cfg)[源代码]¶: Build head.

mmaction.models.build_localizer(cfg)[源代码]¶: Build localizer.

mmaction.models.build_loss(cfg)[源代码]¶: Build loss.

mmaction.models.build_model(cfg, train_cfg=None, test_cfg=None)[源代码]¶: Build model.

mmaction.models.build_neck(cfg)[源代码]¶: Build neck.

mmaction.models.build_recognizer(cfg, train_cfg=None, test_cfg=None)[源代码]¶: Build recognizer.

recognizers¶

class mmaction.models.recognizers.AudioRecognizer(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

Audio recognizer model framework.

forward(audios, label=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_gradcam(audios)[源代码]¶: Defines the computation performed at every all when using gradcam utils.

forward_test(audios)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

forward_train(audios, labels)[源代码]¶: Defines the computation performed at every call when training.

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

class mmaction.models.recognizers.BaseRecognizer(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

Base class for recognizers.

All recognizers should subclass it. All subclass should overwrite:

Methods:forward_train, supporting to forward when training.
Methods:forward_test, supporting to forward when testing.

参数

backbone (dict) – Backbone modules to extract feature.
cls_head (dict) – Classification head to process feature.
train_cfg (dict | None) – Config for training. Default: None.
test_cfg (dict | None) – Config for testing. Default: None.

average_clip(cls_score, num_segs=1)[源代码]¶

Averaging class score over multiple clips.

Using different averaging types (‘score’ or ‘prob’ or None, which defined in test_cfg) to computed the final averaged class score. Only called in test mode.

参数

cls_score (torch.Tensor) – Class score to be averaged.
num_segs (int) – Number of clips for each input sample.

返回

Averaged class score.

返回类型

torch.Tensor

extract_feat(imgs)[源代码]¶

Extract features through a backbone.

参数: imgs (torch.Tensor) – The input images.
返回: The extracted features.
返回类型: torch.tensor

forward(imgs, label=None, return_loss=True, **kwargs)[源代码]¶: Define the computation performed at every call.

abstract forward_gradcam(imgs)[源代码]¶: Defines the computation performed at every all when using gradcam utils.

abstract forward_test(imgs)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

abstract forward_train(imgs, labels, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

init_weights()[源代码]¶: Initialize the model network weights.

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

property with_neck¶

whether the detector has a neck

Type: bool

class mmaction.models.recognizers.Recognizer2D(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

2D recognizer model framework.

forward_dummy(imgs, softmax=False)[源代码]¶

Used for computing network FLOPs.

See tools/analysis/get_flops.py.

参数: imgs (torch.Tensor) – Input images.
返回: Class score.
返回类型: Tensor

forward_gradcam(imgs)[源代码]¶: Defines the computation performed at every call when using gradcam utils.

forward_test(imgs)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

forward_train(imgs, labels, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

class mmaction.models.recognizers.Recognizer3D(backbone, cls_head, neck=None, train_cfg=None, test_cfg=None)[源代码]¶

3D recognizer model framework.

forward_dummy(imgs, softmax=False)[源代码]¶

Used for computing network FLOPs.

See tools/analysis/get_flops.py.

参数: imgs (torch.Tensor) – Input images.
返回: Class score.
返回类型: Tensor

forward_gradcam(imgs)[源代码]¶: Defines the computation performed at every call when using gradcam utils.

forward_test(imgs)[源代码]¶: Defines the computation performed at every call when evaluation and testing.

forward_train(imgs, labels, **kwargs)[源代码]¶: Defines the computation performed at every call when training.

localizers¶

class mmaction.models.localizers.BMN(temporal_dim, boundary_ratio, num_samples, num_samples_per_bin, feat_dim, soft_nms_alpha, soft_nms_low_threshold, soft_nms_high_threshold, post_process_top_k, feature_extraction_interval=16, loss_cls={'type': 'BMNLoss'}, hidden_dim_1d=256, hidden_dim_2d=128, hidden_dim_3d=512)[源代码]¶

Boundary Matching Network for temporal action proposal generation.

Please refer BMN: Boundary-Matching Network for Temporal Action Proposal Generation. Code Reference https://github.com/JJBOY/BMN-Boundary-Matching-Network

参数

temporal_dim (int) – Total frames selected for each video.
boundary_ratio (float) – Ratio for determining video boundaries.
num_samples (int) – Number of samples for each proposal.
num_samples_per_bin (int) – Number of bin samples for each sample.
feat_dim (int) – Feature dimension.
soft_nms_alpha (float) – Soft NMS alpha.
soft_nms_low_threshold (float) – Soft NMS low threshold.
soft_nms_high_threshold (float) – Soft NMS high threshold.
post_process_top_k (int) – Top k proposals in post process.
feature_extraction_interval (int) – Interval used in feature extraction. Default: 16.
loss_cls (dict) – Config for building loss. Default: dict(type='BMNLoss').
hidden_dim_1d (int) – Hidden dim for 1d conv. Default: 256.
hidden_dim_2d (int) – Hidden dim for 2d conv. Default: 128.
hidden_dim_3d (int) – Hidden dim for 3d conv. Default: 512.

forward(raw_feature, gt_bbox=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(raw_feature, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(raw_feature, label_confidence, label_start, label_end)[源代码]¶: Define the computation performed at every call when training.

generate_labels(gt_bbox)[源代码]¶: Generate training labels.

class mmaction.models.localizers.BaseLocalizer(backbone, cls_head, train_cfg=None, test_cfg=None)[源代码]¶

Base class for localizers.

All localizers should subclass it. All subclass should overwrite: Methods:forward_train, supporting to forward when training. Methods:forward_test, supporting to forward when testing.

extract_feat(imgs)[源代码]¶

Extract features through a backbone.

参数: imgs (torch.Tensor) – The input images.
返回: The extracted features.
返回类型: torch.tensor

forward(imgs, return_loss=True, **kwargs)[源代码]¶: Define the computation performed at every call.

abstract forward_test(imgs)[源代码]¶: Defines the computation performed at testing.

abstract forward_train(imgs, labels)[源代码]¶: Defines the computation performed at training.

init_weights()[源代码]¶: Weight initialization for model.

train_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数

data_batch (dict) – The output of dataloader.
optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,: num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer, **kwargs)[源代码]¶

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

class mmaction.models.localizers.PEM(pem_feat_dim, pem_hidden_dim, pem_u_ratio_m, pem_u_ratio_l, pem_high_temporal_iou_threshold, pem_low_temporal_iou_threshold, soft_nms_alpha, soft_nms_low_threshold, soft_nms_high_threshold, post_process_top_k, feature_extraction_interval=16, fc1_ratio=0.1, fc2_ratio=0.1, output_dim=1)[源代码]¶

Proposals Evaluation Model for Boundary Sensetive Network.

Please refer BSN: Boundary Sensitive Network for Temporal Action Proposal Generation.

Code reference https://github.com/wzmsltw/BSN-boundary-sensitive-network

参数

pem_feat_dim (int) – Feature dimension.
pem_hidden_dim (int) – Hidden layer dimension.
pem_u_ratio_m (float) – Ratio for medium score proprosals to balance data.
pem_u_ratio_l (float) – Ratio for low score proprosals to balance data.
pem_high_temporal_iou_threshold (float) – High IoU threshold.
pem_low_temporal_iou_threshold (float) – Low IoU threshold.
soft_nms_alpha (float) – Soft NMS alpha.
soft_nms_low_threshold (float) – Soft NMS low threshold.
soft_nms_high_threshold (float) – Soft NMS high threshold.
post_process_top_k (int) – Top k proposals in post process.
feature_extraction_interval (int) – Interval used in feature extraction. Default: 16.
fc1_ratio (float) – Ratio for fc1 layer output. Default: 0.1.
fc2_ratio (float) – Ratio for fc2 layer output. Default: 0.1.
output_dim (int) – Output dimension. Default: 1.

forward(bsp_feature, reference_temporal_iou=None, tmin=None, tmax=None, tmin_score=None, tmax_score=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(bsp_feature, tmin, tmax, tmin_score, tmax_score, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(bsp_feature, reference_temporal_iou)[源代码]¶: Define the computation performed at every call when training.

class mmaction.models.localizers.SSN(backbone, cls_head, in_channels=3, spatial_type='avg', dropout_ratio=0.5, loss_cls={'type': 'SSNLoss'}, train_cfg=None, test_cfg=None)[源代码]¶

Temporal Action Detection with Structured Segment Networks.

参数

backbone (dict) – Config for building backbone.
cls_head (dict) – Config for building classification head.
in_channels (int) – Number of channels for input data. Default: 3.
spatial_type (str) – Type of spatial pooling. Default: ‘avg’.
dropout_ratio (float) – Ratio of dropout. Default: 0.5.
loss_cls (dict) – Config for building loss. Default: dict(type='SSNLoss').
train_cfg (dict | None) – Config for training. Default: None.
test_cfg (dict | None) – Config for testing. Default: None.

forward_test(imgs, relative_proposal_list, scale_factor_list, proposal_tick_list, reg_norm_consts, **kwargs)[源代码]¶: Define the computation performed at every call when testing.

forward_train(imgs, proposal_scale_factor, proposal_type, proposal_labels, reg_targets, **kwargs)[源代码]¶: Define the computation performed at every call when training.

class mmaction.models.localizers.TEM(temporal_dim, boundary_ratio, tem_feat_dim, tem_hidden_dim, tem_match_threshold, loss_cls={'type': 'BinaryLogisticRegressionLoss'}, loss_weight=2, output_dim=3, conv1_ratio=1, conv2_ratio=1, conv3_ratio=0.01)[源代码]¶

Temporal Evaluation Model for Boundary Sensetive Network.

Please refer BSN: Boundary Sensitive Network for Temporal Action Proposal Generation.

Code reference https://github.com/wzmsltw/BSN-boundary-sensitive-network

参数

tem_feat_dim (int) – Feature dimension.
tem_hidden_dim (int) – Hidden layer dimension.
tem_match_threshold (float) – Temporal evaluation match threshold.
loss_cls (dict) – Config for building loss. Default: dict(type='BinaryLogisticRegressionLoss').
loss_weight (float) – Weight term for action_loss. Default: 2.
output_dim (int) – Output dimension. Default: 3.
conv1_ratio (float) – Ratio of conv1 layer output. Default: 1.0.
conv2_ratio (float) – Ratio of conv2 layer output. Default: 1.0.
conv3_ratio (float) – Ratio of conv3 layer output. Default: 0.01.

forward(raw_feature, gt_bbox=None, video_meta=None, return_loss=True)[源代码]¶: Define the computation performed at every call.

forward_test(raw_feature, video_meta)[源代码]¶: Define the computation performed at every call when testing.

forward_train(raw_feature, label_action, label_start, label_end)[源代码]¶: Define the computation performed at every call when training.

generate_labels(gt_bbox)[源代码]¶: Generate training labels.

common¶

class mmaction.models.common.Conv2plus1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, norm_cfg={'type': 'BN3d'})[源代码]¶

(2+1)d Conv module for R(2+1)d backbone.

https://arxiv.org/pdf/1711.11248.pdf.

参数

in_channels (int) – Same as nn.Conv3d.
out_channels (int) – Same as nn.Conv3d.
kernel_size (int | tuple[int]) – Same as nn.Conv3d.
stride (int | tuple[int]) – Same as nn.Conv3d.
padding (int | tuple[int]) – Same as nn.Conv3d.
dilation (int | tuple[int]) – Same as nn.Conv3d.
groups (int) – Same as nn.Conv3d.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.common.ConvAudio(in_channels, out_channels, kernel_size, op='concat', stride=1, padding=0, dilation=1, groups=1, bias=False)[源代码]¶

Conv2d module for AudioResNet backbone.

<https://arxiv.org/abs/2001.08740>`_.

参数

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int | tuple[int]) – Same as nn.Conv2d.
op (string) – Operation to merge the output of freq and time feature map. Choices are ‘sum’ and ‘concat’. Default: ‘concat’.
stride (int | tuple[int]) – Same as nn.Conv2d.
padding (int | tuple[int]) – Same as nn.Conv2d.
dilation (int | tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.common.LFB(lfb_prefix_path, max_num_sampled_feat=5, window_size=60, lfb_channels=2048, dataset_modes=('train', 'val'), device='gpu', lmdb_map_size=4000000000.0, construct_lmdb=True)[源代码]¶

Long-Term Feature Bank (LFB).

LFB is proposed in Long-Term Feature Banks for Detailed Video Understanding

The ROI features of videos are stored in the feature bank. The feature bank was generated by inferring with a lfb infer config.

Formally, LFB is a Dict whose keys are video IDs and its values are also Dicts whose keys are timestamps in seconds. Example of LFB:

参数

lfb_prefix_path (str) – The storage path of lfb.
max_num_sampled_feat (int) – The max number of sampled features. Default: 5.
window_size (int) – Window size of sampling long term feature. Default: 60.
lfb_channels (int) – Number of the channels of the features stored in LFB. Default: 2048.
dataset_modes (tuple[str] | str) – Load LFB of datasets with different modes, such as training, validation, testing datasets. If you don’t do cross validation during training, just load the training dataset i.e. setting dataset_modes = (‘train’). Default: (‘train’, ‘val’).
device (str) – Where to load lfb. Choices are ‘gpu’, ‘cpu’ and ‘lmdb’. A 1.65GB half-precision ava lfb (including training and validation) occupies about 2GB GPU memory. Default: ‘gpu’.
lmdb_map_size (int) – Map size of lmdb. Default: 4e9.
construct_lmdb (bool) – Whether to construct lmdb. If you have constructed lmdb of lfb, you can set to False to skip the construction. Default: True.

class mmaction.models.common.TAM(in_channels, num_segments, alpha=2, adaptive_kernel_size=3, beta=4, conv1d_kernel_size=3, adaptive_convolution_stride=1, adaptive_convolution_padding=1, init_std=0.001)[源代码]¶

Temporal Adaptive Module(TAM) for TANet.

This module is proposed in TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO RECOGNITION

参数

in_channels (int) – Channel num of input features.
num_segments (int) – Number of frame segments.
alpha (int) – `alpha` in the paper and is the ratio of the intermediate channel number to the initial channel number in the global branch. Default: 2.
adaptive_kernel_size (int) – `K` in the paper and is the size of the adaptive kernel size in the global branch. Default: 3.
beta (int) – `beta` in the paper and is set to control the model complexity in the local branch. Default: 4.
conv1d_kernel_size (int) – Size of the convolution kernel of Conv1d in the local branch. Default: 3.
adaptive_convolution_stride (int) – The first dimension of strides in the adaptive convolution of `Temporal Adaptive Aggregation`. Default: 1.
adaptive_convolution_padding (int) – The first dimension of paddings in the adaptive convolution of `Temporal Adaptive Aggregation`. Default: 1.
init_std (float) – Std value for initiation of nn.Linear. Default: 0.001.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The output of the module.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

backbones¶

class mmaction.models.backbones.C3D(pretrained=None, style='pytorch', conv_cfg=None, norm_cfg=None, act_cfg=None, dropout_ratio=0.5, init_std=0.005)[源代码]¶

C3D backbone.

参数

pretrained (str | None) – Name of pretrained model.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
conv_cfg (dict | None) – Config dict for convolution layer. If set to None, it uses dict(type='Conv3d') to construct layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. required keys are type, Default: None.
act_cfg (dict | None) – Config dict for activation layer. If set to None, it uses dict(type='ReLU') to construct layers. Default: None.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation of fc layers. Default: 0.01.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data. the size of x is (num_batches, 3, 16, 112, 112).
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

class mmaction.models.backbones.MobileNetV2(pretrained=None, widen_factor=1.0, out_indices=(7), frozen_stages=- 1, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU6'}, norm_eval=False, with_cp=False)[源代码]¶

MobileNetV2 backbone.

参数

pretrained (str | None) – Name of pretrained model. Default: None.
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_layer(out_channels, num_blocks, stride, expand_ratio)[源代码]¶

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数

out_channels (int) – out_channels of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio. Default: 6.

train(mode=True)[源代码]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
返回: self
返回类型: Module

class mmaction.models.backbones.MobileNetV2TSM(num_segments=8, is_shift=True, shift_div=8, **kwargs)[源代码]¶

MobileNetV2 backbone for TSM.

参数

num_segments (int) – Number of frame segments. Default: 8.
is_shift (bool) – Whether to make temporal shift in reset layers. Default: True.
shift_div (int) – Number of div for shift. Default: 8.
**kwargs (keyword arguments, optional) – Arguments for MobilNetV2.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_shift()[源代码]¶: Make temporal shift for some layers.

class mmaction.models.backbones.ResNet(depth, pretrained=None, torchvision_pretrain=True, in_channels=3, num_stages=4, out_indices=(3), strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), style='pytorch', frozen_stages=- 1, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, partial_bn=False, with_cp=False)[源代码]¶

ResNet backbone.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model. Default: None.
in_channels (int) – Channel num of input features. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage.
out_indices (Sequence[int]) – Indices of output feature. Default: (3, ).
dilations (Sequence[int]) – Dilation of each stage.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: pytorch.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict) – Config for norm layers. Default: dict(type=’Conv’).
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN2d’, requires_grad=True).
act_cfg (dict) – Config for activate layers. Default: dict(type=’ReLU’, inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
partial_bn (bool) – Whether to use partial bn. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.backbones.ResNet2Plus1d(*args, **kwargs)[源代码]¶

ResNet (2+1)d backbone.

This model is proposed in A Closer Look at Spatiotemporal Convolutions for Action Recognition

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

class mmaction.models.backbones.ResNet3d(depth, pretrained, pretrained2d=True, in_channels=3, num_stages=4, base_channels=64, out_indices=(3), spatial_strides=(1, 2, 2, 2), temporal_strides=(1, 1, 1, 1), dilations=(1, 1, 1, 1), conv1_kernel=(5, 7, 7), conv1_stride_t=2, pool1_stride_t=2, with_pool2=True, style='pytorch', frozen_stages=- 1, inflate=(1, 1, 1, 1), inflate_style='3x1x1', conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, non_local=(0, 0, 0, 0), non_local_cfg={}, zero_init_residual=True, **kwargs)[源代码]¶

ResNet 3d backbone.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
pretrained2d (bool) – Whether to load pretrained 2D model. Default: True.
in_channels (int) – Channel num of input features. Default: 3.
base_channels (int) – Channel num of stem output features. Default: 64.
out_indices (Sequence[int]) – Indices of output feature. Default: (3, ).
num_stages (int) – Resnet stages. Default: 4.
spatial_strides (Sequence[int]) – Spatial strides of residual blocks of each stage. Default: (1, 2, 2, 2).
temporal_strides (Sequence[int]) – Temporal strides of residual blocks of each stage. Default: (1, 1, 1, 1).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
conv1_kernel (Sequence[int]) – Kernel size of the first conv layer. Default: (5, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 2.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 2.
with_pool2 (bool) – Whether to use pool2. Default: True.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters. Default: -1.
inflate (Sequence[int]) – Inflate Dims of each block. Default: (1, 1, 1, 1).
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
non_local (Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: (0, 0, 0, 0).
non_local_cfg (dict) – Config for non-local module. Default: dict().
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

static make_res_layer(block, inplanes, planes, blocks, spatial_stride=1, temporal_stride=1, dilation=1, style='pytorch', inflate=1, inflate_style='3x1x1', non_local=0, non_local_cfg={}, norm_cfg=None, act_cfg=None, conv_cfg=None, with_cp=False, **kwargs)[源代码]¶

Build residual layer for ResNet3D.

参数

block (nn.Module) – Residual module to be built.
inplanes (int) – Number of channels for the input feature in each block.
planes (int) – Number of channels for the output feature in each block.
blocks (int) – Number of residual blocks.
spatial_stride (int | Sequence[int]) – Spatial strides in residual and conv layers. Default: 1.
temporal_stride (int | Sequence[int]) – Temporal strides in residual and conv layers. Default: 1.
dilation (int) – Spacing between kernel elements. Default: 1.
style (str) – pytorch or caffe. If set to pytorch, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: pytorch.
inflate (int | Sequence[int]) – Determine whether to inflate for each block. Default: 1.
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
non_local (int | Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: 0.
non_local_cfg (dict) – Config for non-local module. Default: dict().
conv_cfg (dict | None) – Config for norm layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. Default: None.
act_cfg (dict | None) – Config for activate layers. Default: None.
with_cp (bool | None) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

返回类型

nn.Module

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.backbones.ResNet3dCSN(depth, pretrained, temporal_strides=(1, 2, 2, 2), conv1_kernel=(3, 7, 7), conv1_stride_t=1, pool1_stride_t=1, norm_cfg={'eps': 0.001, 'requires_grad': True, 'type': 'BN3d'}, inflate_style='3x3x3', bottleneck_mode='ir', bn_frozen=False, **kwargs)[源代码]¶

ResNet backbone for CSN.

参数

depth (int) – Depth of ResNetCSN, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
temporal_strides (tuple[int]) – Temporal strides of residual blocks of each stage. Default: (1, 2, 2, 2).
conv1_kernel (tuple[int]) – Kernel size of the first conv layer. Default: (3, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 1.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 1.
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN3d’, requires_grad=True, eps=1e-3).
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x3x3’.
bottleneck_mode (str) –
Determine which ways to factorize a 3D bottleneck block using channel-separated convolutional networks.

If set to ‘ip’, it will replace the 3x3x3 conv2 layer with a 1x1x1 traditional convolution and a 3x3x3 depthwise convolution, i.e., Interaction-preserved channel-separated bottleneck block. If set to ‘ir’, it will replace the 3x3x3 conv2 layer with a 3x3x3 depthwise convolution, which is derived from preserved bottleneck block by removing the extra 1x1x1 convolution, i.e., Interaction-reduced channel-separated bottleneck block.

Default: ‘ip’.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.backbones.ResNet3dLayer(depth, pretrained, pretrained2d=True, stage=3, base_channels=64, spatial_stride=2, temporal_stride=1, dilation=1, style='pytorch', all_frozen=False, inflate=1, inflate_style='3x1x1', conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, zero_init_residual=True, **kwargs)[源代码]¶

ResNet 3d Layer.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
pretrained2d (bool) – Whether to load pretrained 2D model. Default: True.
stage (int) – The index of Resnet stage. Default: 3.
base_channels (int) – Channel num of stem output features. Default: 64.
spatial_stride (int) – The 1st res block’s spatial stride. Default 2.
temporal_stride (int) – The 1st res block’s temporal stride. Default 1.
dilation (int) – The dilation. Default: 1.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
all_frozen (bool) – Frozen all modules in the layer. Default: False.
inflate (int) – Inflate Dims of each block. Default: 1.
inflate_style (str) – 3x1x1 or 1x1x1. which determines the kernel sizes and padding strides for conv1 and conv2 in each block. Default: ‘3x1x1’.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.backbones.ResNet3dSlowFast(pretrained, resample_rate=8, speed_ratio=8, channel_ratio=8, slow_pathway={'conv1_kernel': (1, 7, 7), 'conv1_stride_t': 1, 'depth': 50, 'dilations': (1, 1, 1, 1), 'inflate': (0, 0, 1, 1), 'lateral': True, 'pool1_stride_t': 1, 'pretrained': None, 'type': 'resnet3d'}, fast_pathway={'base_channels': 8, 'conv1_kernel': (5, 7, 7), 'conv1_stride_t': 1, 'depth': 50, 'lateral': False, 'pool1_stride_t': 1, 'pretrained': None, 'type': 'resnet3d'})[源代码]¶

Slowfast backbone.

This module is proposed in SlowFast Networks for Video Recognition

参数

pretrained (str) – The file path to a pretrained model.
resample_rate (int) – A large temporal stride resample_rate on input frames. The actual resample rate is calculated by multipling the interval in SampleFrames in the pipeline with resample_rate, equivalent to the \(\tau\) in the paper, i.e. it processes only one out of resample_rate * interval frames. Default: 8.
speed_ratio (int) – Speed ratio indicating the ratio between time dimension of the fast and slow pathway, corresponding to the \(\alpha\) in the paper. Default: 8.
channel_ratio (int) – Reduce the channel number of fast pathway by channel_ratio, corresponding to \(\beta\) in the paper. Default: 8.
slow_pathway (dict) –
Configuration of slow branch, should contain necessary arguments for building the specific type of pathway and: type (str): type of backbone the pathway bases on. lateral (bool): determine whether to build lateral connection for the pathway.Default:
```
dict(type='ResNetPathway',
lateral=True, depth=50, pretrained=None,
conv1_kernel=(1, 7, 7), dilations=(1, 1, 1, 1),
conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1))
```

fast_pathway (dict) –

Configuration of fast branch, similar to slow_pathway. Default:

dict(type='ResNetPathway',
lateral=False, depth=50, pretrained=None, base_channels=8,
conv1_kernel=(5, 7, 7), conv1_stride_t=1, pool1_stride_t=1)

forward(x)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.

返回

The feature of the input samples extracted: by the backbone.

返回类型

tuple[torch.Tensor]

init_weights(pretrained=None)[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

class mmaction.models.backbones.ResNet3dSlowOnly(*args, lateral=False, conv1_kernel=(1, 7, 7), conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1), with_pool2=False, **kwargs)[源代码]¶

SlowOnly backbone based on ResNet3dPathway.

参数

*args (arguments) – Arguments same as ResNet3dPathway.
conv1_kernel (Sequence[int]) – Kernel size of the first conv layer. Default: (1, 7, 7).
conv1_stride_t (int) – Temporal stride of the first conv layer. Default: 1.
pool1_stride_t (int) – Temporal stride of the first pooling layer. Default: 1.
inflate (Sequence[int]) – Inflate Dims of each block. Default: (0, 0, 1, 1).
**kwargs (keyword arguments) – Keywords arguments for ResNet3dPathway.

class mmaction.models.backbones.ResNetAudio(depth, pretrained, in_channels=1, num_stages=4, base_channels=32, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), conv1_kernel=9, conv1_stride=1, frozen_stages=- 1, factorize=(1, 1, 0, 0), norm_eval=False, with_cp=False, conv_cfg={'type': 'Conv'}, norm_cfg={'requires_grad': True, 'type': 'BN2d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, zero_init_residual=True)[源代码]¶

ResNet 2d audio backbone. Reference:

<https://arxiv.org/abs/2001.08740>`_.

参数

depth (int) – Depth of resnet, from {50, 101, 152}.
pretrained (str | None) – Name of pretrained model.
in_channels (int) – Channel num of input features. Default: 1.
base_channels (int) – Channel num of stem output features. Default: 32.
num_stages (int) – Resnet stages. Default: 4.
strides (Sequence[int]) – Strides of residual blocks of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
conv1_kernel (int) – Kernel size of the first conv layer. Default: 9.
conv1_stride (int | tuple[int]) – Stride of the first conv layer. Default: 1.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
factorize (Sequence[int]) – factorize Dims of each block for audio. Default: (1, 1, 0, 0).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict) – Config for norm layers. Default: dict(type=’Conv’).
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type=’BN2d’, requires_grad=True).
act_cfg (dict) – Config for activate layers. Default: dict(type=’ReLU’, inplace=True).
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_res_layer(block, inplanes, planes, blocks, stride=1, dilation=1, factorize=1, norm_cfg=None, with_cp=False)[源代码]¶

Build residual layer for ResNetAudio.

参数

block (nn.Module) – Residual module to be built.
inplanes (int) – Number of channels for the input feature in each block.
planes (int) – Number of channels for the output feature in each block.
blocks (int) – Number of residual blocks.
strides (Sequence[int]) – Strides of residual blocks of each stage. Default: (1, 2, 2, 2).
dilation (int) – Spacing between kernel elements. Default: 1.
factorize (int | Sequence[int]) – Determine whether to factorize for each block. Default: 1.
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: None.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

train(mode=True)[源代码]¶: Set the optimization status when training.

class mmaction.models.backbones.ResNetTIN(depth, num_segments=8, is_tin=True, shift_div=4, **kwargs)[源代码]¶

ResNet backbone for TIN.

参数

depth (int) – Depth of ResNet, from {18, 34, 50, 101, 152}.
num_segments (int) – Number of frame segments. Default: 8.
is_tin (bool) – Whether to apply temporal interlace. Default: True.
shift_div (int) – Number of division parts for shift. Default: 4.
kwargs (dict, optional) – Arguments for ResNet.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_interlace()[源代码]¶: Make temporal interlace for some layers.

class mmaction.models.backbones.ResNetTSM(depth, num_segments=8, is_shift=True, non_local=(0, 0, 0, 0), non_local_cfg={}, shift_div=8, shift_place='blockres', temporal_pool=False, **kwargs)[源代码]¶

ResNet backbone for TSM.

参数

num_segments (int) – Number of frame segments. Default: 8.
is_shift (bool) – Whether to make temporal shift in reset layers. Default: True.
non_local (Sequence[int]) – Determine whether to apply non-local module in the corresponding block of each stages. Default: (0, 0, 0, 0).
non_local_cfg (dict) – Config for non-local module. Default: dict().
shift_div (int) – Number of div for shift. Default: 8.
shift_place (str) – Places in resnet layers for shift, which is chosen from [‘block’, ‘blockres’]. If set to ‘block’, it will apply temporal shift to all child blocks in each resnet layer. If set to ‘blockres’, it will apply temporal shift to each conv1 layer of all child blocks in each resnet layer. Default: ‘blockres’.
temporal_pool (bool) – Whether to add temporal pooling. Default: False.
**kwargs (keyword arguments, optional) – Arguments for ResNet.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_temporal_pool()[源代码]¶: Make temporal pooling between layer1 and layer2, using a 3D max pooling layer.

make_temporal_shift()[源代码]¶: Make temporal shift for some layers.

class mmaction.models.backbones.TANet(depth, num_segments, tam_cfg={}, **kwargs)[源代码]¶

Temporal Adaptive Network (TANet) backbone.

This backbone is proposed in TAM: TEMPORAL ADAPTIVE MODULE FOR VIDEO RECOGNITION

Embedding the temporal adaptive module (TAM) into ResNet to instantiate TANet.

参数

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
num_segments (int) – Number of frame segments.
tam_cfg (dict | None) – Config for temporal adaptive module (TAM). Default: dict().
**kwargs (keyword arguments, optional) – Arguments for ResNet except `depth`.

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_tam_modeling()[源代码]¶: Replace ResNet-Block with TA-Block.

class mmaction.models.backbones.X3D(gamma_w=1.0, gamma_b=1.0, gamma_d=1.0, pretrained=None, in_channels=3, num_stages=4, spatial_strides=(2, 2, 2, 2), frozen_stages=- 1, se_style='half', se_ratio=0.0625, use_swish=True, conv_cfg={'type': 'Conv3d'}, norm_cfg={'requires_grad': True, 'type': 'BN3d'}, act_cfg={'inplace': True, 'type': 'ReLU'}, norm_eval=False, with_cp=False, zero_init_residual=True, **kwargs)[源代码]¶

X3D backbone. https://arxiv.org/pdf/2004.04730.pdf.

参数

gamma_w (float) – Global channel width expansion factor. Default: 1.
gamma_b (float) – Bottleneck channel width expansion factor. Default: 1.
gamma_d (float) – Network depth expansion factor. Default: 1.
pretrained (str | None) – Name of pretrained model. Default: None.
in_channels (int) – Channel num of input features. Default: 3.
num_stages (int) – Resnet stages. Default: 4.
spatial_strides (Sequence[int]) – Spatial strides of residual blocks of each stage. Default: (1, 2, 2, 2).
frozen_stages (int) – Stages to be frozen (all param fixed). If set to -1, it means not freezing any parameters. Default: -1.
se_style (str) – The style of inserting SE modules into BlockX3D, ‘half’ denotes insert into half of the blocks, while ‘all’ denotes insert into all blocks. Default: ‘half’.
se_ratio (float | None) – The reduction ratio of squeeze and excitation unit. If set as None, it means not using SE unit. Default: 1 / 16.
use_swish (bool) – Whether to use swish as the activation function before and after the 3x3x3 conv. Default: True.
conv_cfg (dict) – Config for conv layers. required keys are type Default: dict(type='Conv3d').
norm_cfg (dict) – Config for norm layers. required keys are type and requires_grad. Default: dict(type='BN3d', requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type='ReLU', inplace=True).
norm_eval (bool) – Whether to set BN layers to eval mode, namely, freeze running stats (mean and var). Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero initialization for residual block, Default: True.
kwargs (dict, optional) – Key arguments for “make_res_layer”.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The feature of the input samples extracted by the backbone.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

make_res_layer(block, layer_inplanes, inplanes, planes, blocks, spatial_stride=1, se_style='half', se_ratio=None, use_swish=True, norm_cfg=None, act_cfg=None, conv_cfg=None, with_cp=False, **kwargs)[源代码]¶

Build residual layer for ResNet3D.

参数

block (nn.Module) – Residual module to be built.
layer_inplanes (int) – Number of channels for the input feature of the res layer.
inplanes (int) – Number of channels for the input feature in each block, which equals to base_channels * gamma_w.
planes (int) – Number of channels for the output feature in each block, which equals to base_channel * gamma_w * gamma_b.
blocks (int) – Number of residual blocks.
spatial_stride (int) – Spatial strides in residual and conv layers. Default: 1.
se_style (str) – The style of inserting SE modules into BlockX3D, ‘half’ denotes insert into half of the blocks, while ‘all’ denotes insert into all blocks. Default: ‘half’.
se_ratio (float | None) – The reduction ratio of squeeze and excitation unit. If set as None, it means not using SE unit. Default: None.
use_swish (bool) – Whether to use swish as the activation function before and after the 3x3x3 conv. Default: True.
conv_cfg (dict | None) – Config for norm layers. Default: None.
norm_cfg (dict | None) – Config for norm layers. Default: None.
act_cfg (dict | None) – Config for activate layers. Default: None.
with_cp (bool | None) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

返回

A residual layer for the given config.

返回类型

nn.Module

train(mode=True)[源代码]¶: Set the optimization status when training.

heads¶

class mmaction.models.heads.AudioTSNHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.4, init_std=0.01, **kwargs)[源代码]¶

Classification head for TSN on audio.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.BBoxHeadAVA(temporal_pool_type='avg', spatial_pool_type='max', in_channels=2048, num_classes=81, dropout_ratio=0, dropout_before_pool=True, topk=(3, 5), multilabel=True)[源代码]¶

Simplest RoI head, with only two fc layers for classification and regression respectively.

参数

temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.
in_channels (int) – The number of input channels. Default: 2048.
num_classes (int) – The number of classes. Default: 81.
dropout_ratio (float) – A float in [0, 1], indicates the dropout_ratio. Default: 0.
dropout_before_pool (bool) – Dropout Feature before spatial temporal pooling. Default: True.
topk (int or tuple[int]) – Parameter for evaluating multilabel accuracy. Default: (3, 5)
multilabel (bool) – Whether used for a multilabel task. Default: True. (Only support multilabel == True now).

forward(x)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

recall_prec(pred_vec, target_vec)[源代码]¶

参数

pred_vec (tensor[N x C]) – each element is either 0 or 1
target_vec (tensor[N x C]) – each element is either 0 or 1

class mmaction.models.heads.BaseHead(num_classes, in_channels, loss_cls={'loss_weight': 1.0, 'type': 'CrossEntropyLoss'}, multi_class=False, label_smooth_eps=0.0)[源代码]¶

Base class for head.

All Head should subclass it. All subclass should overwrite: - Methods:init_weights, initializing weights in some modules. - Methods:forward, supporting to forward both for training and testing.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’, loss_weight=1.0).
multi_class (bool) – Determines whether it is a multi-class recognition task. Default: False.
label_smooth_eps (float) – Epsilon used in label smooth. Reference: arxiv.org/abs/1906.02629. Default: 0.

abstract forward(x)[源代码]¶: Defines the computation performed at every call.

abstract init_weights()[源代码]¶: Initiate the parameters either from existing checkpoint or from scratch.

loss(cls_score, labels, **kwargs)[源代码]¶

Calculate the loss given output cls_score, target labels.

参数

cls_score (torch.Tensor) – The output of the model.
labels (torch.Tensor) – The target output of the model.

返回

A dict containing field ‘loss_cls’(mandatory) and ‘top1_acc’, ‘top5_acc’(optional).

返回类型

dict

class mmaction.models.heads.FBOHead(lfb_cfg, fbo_cfg, temporal_pool_type='avg', spatial_pool_type='max')[源代码]¶

Feature Bank Operator Head.

Add feature bank operator for the spatiotemporal detection model to fuse short-term features and long-term features.

参数

lfb_cfg (Dict) – The config dict for LFB which is used to sample long-term features.
fbo_cfg (Dict) – The config dict for feature bank operator (FBO). The type of fbo is also in the config dict and supported fbo type is fbo_dict.
temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.

forward(x, rois, img_metas)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights(pretrained=None)[源代码]¶

Initialize the weights in the module.

参数: pretrained (str, optional) – Path to pre-trained weights. Default: None.

sample_lfb(rois, img_metas)[源代码]¶: Sample long-term features for each ROI feature.

class mmaction.models.heads.I3DHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.5, init_std=0.01, **kwargs)[源代码]¶

Classification head for I3D.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.LFBInferHead(lfb_prefix_path, dataset_mode='train', use_half_precision=True, temporal_pool_type='avg', spatial_pool_type='max')[源代码]¶

Long-Term Feature Bank Infer Head.

This head is used to derive and save the LFB without affecting the input.

参数

lfb_prefix_path (str) – The prefix path to store the lfb.
dataset_mode (str, optional) – Which dataset to be inferred. Choices are ‘train’, ‘val’ or ‘test’. Default: ‘train’.
use_half_precision (bool, optional) – Whether to store the half-precision roi features. Default: True.
temporal_pool_type (str) – The temporal pool type. Choices are ‘avg’ or ‘max’. Default: ‘avg’.
spatial_pool_type (str) – The spatial pool type. Choices are ‘avg’ or ‘max’. Default: ‘max’.

forward(x, rois, img_metas)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmaction.models.heads.SSNHead(dropout_ratio=0.8, in_channels=1024, num_classes=20, consensus={'num_seg': (2, 5, 2), 'standalong_classifier': True, 'stpp_cfg': (1, 1, 1), 'type': 'STPPTrain'}, use_regression=True, init_std=0.001)[源代码]¶

The classification head for SSN.

参数

dropout_ratio (float) – Probability of dropout layer. Default: 0.8.
in_channels (int) – Number of channels for input data. Default: 1024.
num_classes (int) – Number of classes to be classified. Default: 20.
consensus (dict) – Config of segmental consensus.
use_regression (bool) – Whether to perform regression or not. Default: True.
init_std (float) – Std value for Initiation. Default: 0.001.

forward(x, test_mode=False)[源代码]¶: Defines the computation performed at every call.

init_weights()[源代码]¶: Initiate the parameters from scratch.

prepare_test_fc(stpp_feat_multiplier)[源代码]¶

Reorganize the shape of fully connected layer at testing, in order to improve testing efficiency.

参数: stpp_feat_multiplier (int) – Total number of parts.
返回: Whether the shape transformation is ready for testing.
返回类型: bool

class mmaction.models.heads.SlowFastHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.8, init_std=0.01, **kwargs)[源代码]¶

The classification head for SlowFast.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.8.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.TPNHead(*args, **kwargs)[源代码]¶

Class head for TPN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
multi_class (bool) – Determines whether it is a multi-class recognition task. Default: False.
label_smooth_eps (float) – Epsilon used in label smooth. Reference: https://arxiv.org/abs/1906.02629. Default: 0.

forward(x, num_segs=None, fcn_test=False)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int | None) – Number of segments into which a video is divided. Default: None.
fcn_test (bool) – Whether to apply full convolution (fcn) testing. Default: False.

返回

The classification scores for input samples.

返回类型

torch.Tensor

class mmaction.models.heads.TRNHead(num_classes, in_channels, num_segments=8, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', relation_type='TRNMultiScale', hidden_dim=256, dropout_ratio=0.8, init_std=0.001, **kwargs)[源代码]¶

Class head for TRN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
num_segments (int) – Number of frame segments. Default: 8.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
relation_type (str) – The relation module type. Choices are ‘TRN’ or ‘TRNMultiScale’. Default: ‘TRNMultiScale’.
hidden_dim (int) – The dimension of hidden layer of MLP in relation module. Default: 256.
dropout_ratio (float) – Probability of dropout layer. Default: 0.8.
init_std (float) – Std value for Initiation. Default: 0.001.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Useless in TRNHead. By default, num_segs is equal to clip_len * num_clips * num_crops, which is automatically generated in Recognizer forward phase and useless in TRN models. The self.num_segments we need is a hyper parameter to build TRN models.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.TSMHead(num_classes, in_channels, num_segments=8, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', consensus={'dim': 1, 'type': 'AvgConsensus'}, dropout_ratio=0.8, init_std=0.001, is_shift=True, temporal_pool=False, **kwargs)[源代码]¶

Class head for TSM.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
num_segments (int) – Number of frame segments. Default: 8.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
is_shift (bool) – Indicating whether the feature is shifted. Default: True.
temporal_pool (bool) – Indicating whether feature is temporal pooled. Default: False.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Useless in TSMHead. By default, num_segs is equal to clip_len * num_clips * num_crops, which is automatically generated in Recognizer forward phase and useless in TSM models. The self.num_segments we need is a hyper parameter to build TSM models.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.TSNHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', consensus={'dim': 1, 'type': 'AvgConsensus'}, dropout_ratio=0.4, init_std=0.01, **kwargs)[源代码]¶

Class head for TSN.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’).
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
consensus (dict) – Consensus config dict.
dropout_ratio (float) – Probability of dropout layer. Default: 0.4.
init_std (float) – Std value for Initiation. Default: 0.01.
kwargs (dict, optional) – Any keyword argument to be used to initialize the head.

forward(x, num_segs)[源代码]¶

Defines the computation performed at every call.

参数

x (torch.Tensor) – The input data.
num_segs (int) – Number of segments into which a video is divided.

返回

The classification scores for input samples.

返回类型

torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

class mmaction.models.heads.X3DHead(num_classes, in_channels, loss_cls={'type': 'CrossEntropyLoss'}, spatial_type='avg', dropout_ratio=0.5, init_std=0.01, fc1_bias=False)[源代码]¶

Classification head for I3D.

参数

num_classes (int) – Number of classes to be classified.
in_channels (int) – Number of channels in input feature.
loss_cls (dict) – Config for building loss. Default: dict(type=’CrossEntropyLoss’)
spatial_type (str) – Pooling type in spatial dimension. Default: ‘avg’.
dropout_ratio (float) – Probability of dropout layer. Default: 0.5.
init_std (float) – Std value for Initiation. Default: 0.01.
fc1_bias (bool) – If the first fc layer has bias. Default: False.

forward(x)[源代码]¶

Defines the computation performed at every call.

参数: x (torch.Tensor) – The input data.
返回: The classification scores for input samples.
返回类型: torch.Tensor

init_weights()[源代码]¶: Initiate the parameters from scratch.

necks¶

class mmaction.models.necks.TPN(in_channels, out_channels, spatial_modulation_cfg=None, temporal_modulation_cfg=None, upsample_cfg=None, downsample_cfg=None, level_fusion_cfg=None, aux_head_cfg=None, flow_type='cascade')[源代码]¶

TPN neck.

This module is proposed in Temporal Pyramid Network for Action Recognition

参数

in_channels (tuple[int]) – Channel numbers of input features tuple.
out_channels (int) – Channel number of output feature.
spatial_modulation_cfg (dict | None) – Config for spatial modulation layers. Required keys are in_channels and out_channels. Default: None.
temporal_modulation_cfg (dict | None) – Config for temporal modulation layers. Default: None.
upsample_cfg (dict | None) – Config for upsample layers. The keys are same as that in :class:nn.Upsample. Default: None.
downsample_cfg (dict | None) – Config for downsample layers. Default: None.
level_fusion_cfg (dict | None) – Config for level fusion layers. Required keys are ‘in_channels’, ‘mid_channels’, ‘out_channels’. Default: None.
aux_head_cfg (dict | None) – Config for aux head layers. Required keys are ‘out_channels’. Default: None.
flow_type (str) – Flow type to combine the features. Options are ‘cascade’ and ‘parallel’. Default: ‘cascade’.

forward(x, target=None)[源代码]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

losses¶

class mmaction.models.losses.BCELossWithLogits(loss_weight=1.0, class_weight=None)[源代码]¶

Binary Cross Entropy Loss with logits.

参数

loss_weight (float) – Factor scalar multiplied on the loss. Default: 1.0.
class_weight (list[float] | None) – Loss weight for each class. If set as None, use the same weight 1 for all classes. Only applies to CrossEntropyLoss and BCELossWithLogits (should not be set when using other losses). Default: None.

class mmaction.models.losses.BMNLoss[源代码]¶

BMN Loss.

From paper https://arxiv.org/abs/1907.09702, code https://github.com/JJBOY/BMN-Boundary-Matching-Network. It will calculate loss for BMN Model. This loss is a weighted sum of

1) temporal evaluation loss based on confidence score of start and end positions. 2) proposal evaluation regression loss based on confidence scores of candidate proposals. 3) proposal evaluation classification loss based on classification results of candidate proposals.

forward(pred_bm, pred_start, pred_end, gt_iou_map, gt_start, gt_end, bm_mask, weight_tem=1.0, weight_pem_reg=10.0, weight_pem_cls=1.0)[源代码]¶

Calculate Boundary Matching Network Loss.

参数

pred_bm (torch.Tensor) – Predicted confidence score for boundary matching map.
pred_start (torch.Tensor) – Predicted confidence score for start.
pred_end (torch.Tensor) – Predicted confidence score for end.
gt_iou_map (torch.Tensor) – Groundtruth score for boundary matching map.
gt_start (torch.Tensor) – Groundtruth temporal_iou score for start.
gt_end (torch.Tensor) – Groundtruth temporal_iou score for end.
bm_mask (torch.Tensor) – Boundary-Matching mask.
weight_tem (float) – Weight for tem loss. Default: 1.0.
weight_pem_reg (float) – Weight for pem regression loss. Default: 10.0.
weight_pem_cls (float) – Weight for pem classification loss. Default: 1.0.

返回

(loss, tem_loss, pem_reg_loss, pem_cls_loss). Loss is the bmn loss, tem_loss is the temporal evaluation loss, pem_reg_loss is the proposal evaluation regression loss, pem_cls_loss is the proposal evaluation classification loss.

返回类型

tuple([torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor])

static pem_cls_loss(pred_score, gt_iou_map, mask, threshold=0.9, ratio_range=(1.05, 21), eps=1e-05)[源代码]¶

Calculate Proposal Evaluation Module Classification Loss.

参数

pred_score (torch.Tensor) – Predicted temporal_iou score by BMN.
gt_iou_map (torch.Tensor) – Groundtruth temporal_iou score.
mask (torch.Tensor) – Boundary-Matching mask.
threshold (float) – Threshold of temporal_iou for positive instances. Default: 0.9.
ratio_range (tuple) – Lower bound and upper bound for ratio. Default: (1.05, 21)
eps (float) – Epsilon for small value. Default: 1e-5

返回

Proposal evalutaion classification loss.

返回类型

torch.Tensor

static pem_reg_loss(pred_score, gt_iou_map, mask, high_temporal_iou_threshold=0.7, low_temporal_iou_threshold=0.3)[源代码]¶

Calculate Proposal Evaluation Module Regression Loss.

参数

pred_score (torch.Tensor) – Predicted temporal_iou score by BMN.
gt_iou_map (torch.Tensor) – Groundtruth temporal_iou score.
mask (torch.Tensor) – Boundary-Matching mask.
high_temporal_iou_threshold (float) – Higher threshold of temporal_iou. Default: 0.7.
low_temporal_iou_threshold (float) – Higher threshold of temporal_iou. Default: 0.3.

返回

Proposal evalutaion regression loss.

返回类型

torch.Tensor

static tem_loss(pred_start, pred_end, gt_start, gt_end)[源代码]¶

Calculate Temporal Evaluation Module Loss.

This function calculate the binary_logistic_regression_loss for start and end respectively and returns the sum of their losses.

参数

pred_start (torch.Tensor) – Predicted start score by BMN model.
pred_end (torch.Tensor) – Predicted end score by BMN model.
gt_start (torch.Tensor) – Groundtruth confidence score for start.
gt_end (torch.Tensor) – Groundtruth confidence score for end.

返回

Returned binary logistic loss.

返回类型

torch.Tensor

class mmaction.models.losses.BaseWeightedLoss(loss_weight=1.0)[源代码]¶

Base class for loss.

All subclass should overwrite the _forward() method which returns the normal loss without loss weights.

参数: loss_weight (float) – Factor scalar multiplied on the loss. Default: 1.0.

forward(*args, **kwargs)[源代码]¶

Defines the computation performed at every call.

参数

*args – The positional arguments for the corresponding loss.
**kwargs – The keyword arguments for the corresponding loss.

返回

The calculated loss.

返回类型

torch.Tensor

class mmaction.models.losses.BinaryLogisticRegressionLoss[源代码]¶

Binary Logistic Regression Loss.

It will calculate binary logistic regression loss given reg_score and label.

forward(reg_score, label, threshold=0.5, ratio_range=(1.05, 21), eps=1e-05)[源代码]¶

Calculate Binary Logistic Regression Loss.

参数

reg_score (torch.Tensor) – Predicted score by model.
label (torch.Tensor) – Groundtruth labels.
threshold (float) – Threshold for positive instances. Default: 0.5.
ratio_range (tuple) – Lower bound and upper bound for ratio. Default: (1.05, 21)
eps (float) – Epsilon for small value. Default: 1e-5.

返回

Returned binary logistic loss.

返回类型

torch.Tensor

class mmaction.models.losses.CrossEntropyLoss(loss_weight=1.0, class_weight=None)[源代码]¶

Cross Entropy Loss.

Support two kinds of labels and their corresponding loss type. It’s worth mentioning that loss type will be detected by the shape of cls_score and label. 1) Hard label: This label is an integer array and all of the elements are

in the range [0, num_classes - 1]. This label’s shape should be cls_score’s shape with the num_classes dimension removed.

Soft label(probablity distribution over classes): This label is a
probability distribution and all of the elements are in the range [0, 1]. This label’s shape must be the same as cls_score. For now, only 2-dim soft label is supported.

参数

loss_weight (float) – Factor scalar multiplied on the loss. Default: 1.0.
class_weight (list[float] | None) – Loss weight for each class. If set as None, use the same weight 1 for all classes. Only applies to CrossEntropyLoss and BCELossWithLogits (should not be set when using other losses). Default: None.

class mmaction.models.losses.HVULoss(categories=('action', 'attribute', 'concept', 'event', 'object', 'scene'), category_nums=(739, 117, 291, 69, 1678, 248), category_loss_weights=(1, 1, 1, 1, 1, 1), loss_type='all', with_mask=False, reduction='mean', loss_weight=1.0)[源代码]¶

Calculate the BCELoss for HVU.

参数

categories (tuple[str]) – Names of tag categories, tags are organized in this order. Default: [‘action’, ‘attribute’, ‘concept’, ‘event’, ‘object’, ‘scene’].
category_nums (tuple[int]) – Number of tags for each category. Default: (739, 117, 291, 69, 1678, 248).
category_loss_weights (tuple[float]) – Loss weights of categories, it applies only if loss_type == ‘individual’. The loss weights will be normalized so that the sum equals to 1, so that you can give any positive number as loss weight. Default: (1, 1, 1, 1, 1, 1).
loss_type (str) – The loss type we calculate, we can either calculate the BCELoss for all tags, or calculate the BCELoss for tags in each category. Choices are ‘individual’ or ‘all’. Default: ‘all’.
with_mask (bool) – Since some tag categories are missing for some video clips. If with_mask == True, we will not calculate loss for these missing categories. Otherwise, these missing categories are treated as negative samples.
reduction (str) – Reduction way. Choices are ‘mean’ or ‘sum’. Default: ‘mean’.
loss_weight (float) – The loss weight. Default: 1.0.

class mmaction.models.losses.NLLLoss(loss_weight=1.0)[源代码]¶

NLL Loss.

It will calculate NLL loss given cls_score and label.

class mmaction.models.losses.OHEMHingeLoss[源代码]¶

This class is the core implementation for the completeness loss in paper.

It compute class-wise hinge loss and performs online hard example mining (OHEM).

static backward(ctx, grad_output)[源代码]¶

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by as many outputs did forward() return, and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computated w.r.t. the output.

static forward(ctx, pred, labels, is_positive, ohem_ratio, group_size)[源代码]¶

Calculate OHEM hinge loss.

参数

pred (torch.Tensor) – Predicted completeness score.
labels (torch.Tensor) – Groundtruth class label.
is_positive (int) – Set to 1 when proposals are positive and set to -1 when proposals are incomplete.
ohem_ratio (float) – Ratio of hard examples.
group_size (int) – Number of proposals sampled per video.

返回

Returned class-wise hinge loss.

返回类型

torch.Tensor

class mmaction.models.losses.SSNLoss[源代码]¶

static activity_loss(activity_score, labels, activity_indexer)[源代码]¶

Activity Loss.

It will calculate activity loss given activity_score and label.

Args：: activity_score (torch.Tensor): Predicted activity score. labels (torch.Tensor): Groundtruth class label. activity_indexer (torch.Tensor): Index slices of proposals.

返回: Returned cross entropy loss.
返回类型: torch.Tensor

static classwise_regression_loss(bbox_pred, labels, bbox_targets, regression_indexer)[源代码]¶

Classwise Regression Loss.

It will calculate classwise_regression loss given class_reg_pred and targets.

Args：

bbox_pred (torch.Tensor): Predicted interval center and span: of positive proposals.

labels (torch.Tensor): Groundtruth class label. bbox_targets (torch.Tensor): Groundtruth center and span

of positive proposals.

regression_indexer (torch.Tensor): Index slices of: positive proposals.

返回: Returned class-wise regression loss.
返回类型: torch.Tensor

static completeness_loss(completeness_score, labels, completeness_indexer, positive_per_video, incomplete_per_video, ohem_ratio=0.17)[源代码]¶

Completeness Loss.

It will calculate completeness loss given completeness_score and label.

Args：

completeness_score (torch.Tensor): Predicted completeness score. labels (torch.Tensor): Groundtruth class label. completeness_indexer (torch.Tensor): Index slices of positive and

incomplete proposals.

positive_per_video (int): Number of positive proposals sampled: per video.
incomplete_per_video (int): Number of incomplete proposals sampled: pre video.
ohem_ratio (float): Ratio of online hard example mining.: Default: 0.17.

返回: Returned class-wise completeness loss.
返回类型: torch.Tensor

forward(activity_score, completeness_score, bbox_pred, proposal_type, labels, bbox_targets, train_cfg)[源代码]¶

Calculate Boundary Matching Network Loss.

参数

activity_score (torch.Tensor) – Predicted activity score.
completeness_score (torch.Tensor) – Predicted completeness score.
bbox_pred (torch.Tensor) – Predicted interval center and span of positive proposals.
proposal_type (torch.Tensor) – Type index slices of proposals.
labels (torch.Tensor) – Groundtruth class label.
bbox_targets (torch.Tensor) – Groundtruth center and span of positive proposals.
train_cfg (dict) – Config for training.

返回

(loss_activity, loss_completeness, loss_reg). Loss_activity is the activity loss, loss_completeness is the class-wise completeness loss, loss_reg is the class-wise regression loss.

返回类型

dict([torch.Tensor, torch.Tensor, torch.Tensor])

mmaction.datasets¶

datasets¶

class mmaction.datasets.AVADataset(ann_file, exclude_file, pipeline, label_file=None, filename_tmpl='img_{:05}.jpg', proposal_file=None, person_det_score_thr=0.9, num_classes=81, custom_classes=None, data_prefix=None, test_mode=False, modality='RGB', num_max_proposals=1000, timestamp_start=900, timestamp_end=1800)[源代码]¶

AVA dataset for spatial temporal detection.

Based on official AVA annotation files, the dataset loads raw frames, bounding boxes, proposals and applies specified transformations to return a dict containing the frame tensors and other information.

This datasets can load information from the following files:

ann_file -> ava_{train, val}_{v2.1, v2.2}.csv
exclude_file -> ava_{train, val}_excluded_timestamps_{v2.1, v2.2}.csv
label_file -> ava_action_list_{v2.1, v2.2}.pbtxt /
              ava_action_list_{v2.1, v2.2}_for_activitynet_2019.pbtxt
proposal_file -> ava_dense_proposals_{train, val}.FAIR.recall_93.9.pkl

Particularly, the proposal_file is a pickle file which contains img_key (in format of {video_id},{timestamp}). Example of a pickle file:

{
    ...
    '0f39OWEqJ24,0902':
        array([[0.011   , 0.157   , 0.655   , 0.983   , 0.998163]]),
    '0f39OWEqJ24,0912':
        array([[0.054   , 0.088   , 0.91    , 0.998   , 0.068273],
               [0.016   , 0.161   , 0.519   , 0.974   , 0.984025],
               [0.493   , 0.283   , 0.981   , 0.984   , 0.983621]]),
    ...
}

参数

ann_file (str) – Path to the annotation file like ava_{train, val}_{v2.1, v2.2}.csv.
exclude_file (str) – Path to the excluded timestamp file like ava_{train, val}_excluded_timestamps_{v2.1, v2.2}.csv.
pipeline (list[dict | callable]) – A sequence of data transforms.
label_file (str) – Path to the label file like ava_action_list_{v2.1, v2.2}.pbtxt or ava_action_list_{v2.1, v2.2}_for_activitynet_2019.pbtxt. Default: None.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
proposal_file (str) – Path to the proposal file like ava_dense_proposals_{train, val}.FAIR.recall_93.9.pkl. Default: None.
person_det_score_thr (float) – The threshold of person detection scores, bboxes with scores above the threshold will be used. Default: 0.9. Note that 0 <= person_det_score_thr <= 1. If no proposal has detection score larger than the threshold, the one with the largest detection score will be used.
num_classes (int) – The number of classes of the dataset. Default: 81. (AVA has 80 action classes, another 1-dim is added for potential usage)
custom_classes (list[int]) – A subset of class ids from origin dataset. Please note that 0 should NOT be selected, and num_classes should be equal to len(custom_classes) + 1
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
num_max_proposals (int) – Max proposals number to store. Default: 1000.
timestamp_start (int) – The start point of included timestamps. The default value is referred from the official website. Default: 902.
timestamp_end (int) – The end point of included timestamps. The default value is referred from the official website. Default: 1798.

dump_results(results, out)[源代码]¶: Dump data to json/yaml/pickle strings or files.

evaluate(results, metrics=('mAP'), metric_options=None, logger=None)[源代码]¶

Perform evaluation for common datasets.

参数

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
metric_options (dict) – Dict for metric options. Options are topk for top_k_accuracy. Default: dict(top_k_accuracy=dict(topk=(1, 5))).
logger (logging.Logger | None) – Logger for recording. Default: None.
deprecated_kwargs (dict) – Used for containing deprecated arguments. See ‘https://github.com/open-mmlab/mmaction2/pull/286’.

返回

Evaluation results dict.

返回类型

dict

load_annotations()[源代码]¶: Load the annotation according to ann_file into video_infos.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

class mmaction.datasets.ActivityNetDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[源代码]¶

ActivityNet dataset for temporal action localization.

The dataset loads raw features and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a json file with multiple objects, and each object has a key of the name of a video, and value of total frames of the video, total seconds of the video, annotations of a video, feature frames (frames covered by features) of the video, fps and rfps. Example of a annotation file:

{
    "v_--1DO2V4K74":  {
        "duration_second": 211.53,
        "duration_frame": 6337,
        "annotations": [
            {
                "segment": [
                    30.025882995319815,
                    205.2318595943838
                ],
                "label": "Rock climbing"
            }
        ],
        "feature_frame": 6336,
        "fps": 30.0,
        "rfps": 29.9579255898
    },
    "v_--6bJUbfpnQ": {
        "duration_second": 26.75,
        "duration_frame": 647,
        "annotations": [
            {
                "segment": [
                    2.578755070202808,
                    24.914101404056165
                ],
                "label": "Drinking beer"
            }
        ],
        "feature_frame": 624,
        "fps": 24.0,
        "rfps": 24.1869158879
    },
    ...
}

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str | None) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

dump_results(results, out, output_format, version='VERSION 1.3')[源代码]¶: Dump data to json/csv files.

evaluate(results, metrics='AR@AN', metric_options={'AR@AN': {'max_avg_proposals': 100, 'temporal_iou_thresholds': array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95])}}, logger=None, **deprecated_kwargs)[源代码]¶

Evaluation in feature dataset.

参数

results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘AR@AN’.
metric_options (dict) – Dict for metric options. Options are max_avg_proposals, temporal_iou_thresholds for AR@AN. default: {'AR@AN': dict(max_avg_proposals=100, temporal_iou_thresholds=np.linspace(0.5, 0.95, 10))}.
logger (logging.Logger | None) – Training logger. Defaults: None.
deprecated_kwargs (dict) – Used for containing deprecated arguments. See ‘https://github.com/open-mmlab/mmaction2/pull/286’.

返回

Evaluation results for evaluation metrics.

返回类型

dict

load_annotations()[源代码]¶: Load the annotation according to ann_file into video_infos.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

static proposals2json(results, show_progress=False)[源代码]¶

Convert all proposals to a final dict(json) format.

参数

results (list[dict]) – All proposals.
show_progress (bool) – Whether to show the progress bar. Defaults: False.

返回

The final result dict. E.g.

dict(video-1=[dict(segment=[1.1,2.0]. score=0.9),
              dict(segment=[50.1, 129.3], score=0.6)])

返回类型

dict

class mmaction.datasets.AudioDataset(ann_file, pipeline, suffix='.wav', **kwargs)[源代码]¶

Audio dataset for video recognition. Extracts the audio feature on-the- fly. Annotation file can be that of the rawframe dataset, or:

some/directory-1.wav 163 1
some/directory-2.wav 122 1
some/directory-3.wav 258 2
some/directory-4.wav 234 2
some/directory-5.wav 295 3
some/directory-6.wav 121 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
suffix (str) – The suffix of the audio file. Default: ‘.wav’.
kwargs (dict) – Other keyword args for BaseDataset.

load_annotations()[源代码]¶: Load annotation file to get video information.

class mmaction.datasets.AudioFeatureDataset(ann_file, pipeline, suffix='.npy', **kwargs)[源代码]¶

Audio feature dataset for video recognition. Reads the features extracted off-line. Annotation file can be that of the rawframe dataset, or:

some/directory-1.npy 163 1
some/directory-2.npy 122 1
some/directory-3.npy 258 2
some/directory-4.npy 234 2
some/directory-5.npy 295 3
some/directory-6.npy 121 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
suffix (str) – The suffix of the audio feature file. Default: ‘.npy’.
kwargs (dict) – Other keyword args for BaseDataset.

load_annotations()[源代码]¶: Load annotation file to get video information.

class mmaction.datasets.AudioVisualDataset(ann_file, pipeline, audio_prefix, **kwargs)[源代码]¶

Dataset that reads both audio and visual data, supporting both rawframes and videos. The annotation file is same as that of the rawframe dataset, such as:

some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
audio_prefix (str) – Directory of the audio files.
kwargs (dict) – Other keyword args for RawframeDataset. video_prefix is also allowed if pipeline is designed for videos.

load_annotations()[源代码]¶: Load annotation file to get video information.

class mmaction.datasets.BaseDataset(ann_file, pipeline, data_prefix=None, test_mode=False, multi_class=False, num_classes=None, start_index=1, modality='RGB', sample_by_class=False, power=None)[源代码]¶

Base class for datasets.

All datasets to process video should subclass it. All subclasses should overwrite:

Methods:load_annotations, supporting to load information from an

annotation file. - Methods:prepare_train_frames, providing train data. - Methods:prepare_test_frames, providing test data.

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str | None) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
multi_class (bool) – Determines whether the dataset is a multi-class dataset. Default: False.
num_classes (int | None) – Number of classes of the dataset, used in multi-class datasets. Default: None.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’, ‘Audio’. Default: ‘RGB’.
sample_by_class (bool) – Sampling by class, should be set True when performing inter-class data balancing. Only compatible with multi_class == False. Only applies for training. Default: False.
power (float | None) – We support sampling data with the probability proportional to the power of its label frequency (freq ^ power) when sampling data. power == 1 indicates uniformly sampling all data; power == 0 indicates uniformly sampling all classes. Default: None.

static dump_results(results, out)[源代码]¶: Dump data to json/yaml/pickle strings or files.

evaluate(results, metrics='top_k_accuracy', metric_options={'top_k_accuracy': {'topk': (1, 5)}}, logger=None, **deprecated_kwargs)[源代码]¶

Perform evaluation for common datasets.

参数

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
metric_options (dict) – Dict for metric options. Options are topk for top_k_accuracy. Default: dict(top_k_accuracy=dict(topk=(1, 5))).
logger (logging.Logger | None) – Logger for recording. Default: None.
deprecated_kwargs (dict) – Used for containing deprecated arguments. See ‘https://github.com/open-mmlab/mmaction2/pull/286’.

返回

Evaluation results dict.

返回类型

dict

abstract load_annotations()[源代码]¶: Load the annotation according to ann_file into video_infos.

load_json_annotations()[源代码]¶: Load json annotation file to get video information.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

class mmaction.datasets.BaseMiniBatchBlending(num_classes)[源代码]¶: Base class for Image Aliasing.

class mmaction.datasets.CutmixBlending(num_classes, alpha=0.2)[源代码]¶

Implementing Cutmix in a mini-batch.

This module is proposed in CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. Code Reference https://github.com/clovaai/CutMix-PyTorch

参数

num_classes (int) – The number of classes.
alpha (float) – Parameters for Beta distribution.

do_blending(imgs, label, **kwargs)[源代码]¶: Blending images with cutmix.

static rand_bbox(img_size, lam)[源代码]¶: Generate a random boudning box.

class mmaction.datasets.HVUDataset(ann_file, pipeline, tag_categories, tag_category_nums, filename_tmpl=None, **kwargs)[源代码]¶

HVU dataset, which supports the recognition tags of multiple categories. Accept both video annotation files or rawframe annotation files.

The dataset loads videos or raw frames and applies specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a json file with multiple dictionaries, and each dictionary indicates a sample video with the filename and tags, the tags are organized as different categories. Example of a video dictionary:

{
    'filename': 'gD_G1b0wV5I_001015_001035.mp4',
    'label': {
        'concept': [250, 131, 42, 51, 57, 155, 122],
        'object': [1570, 508],
        'event': [16],
        'action': [180],
        'scene': [206]
    }
}

Example of a rawframe dictionary:

{
    'frame_dir': 'gD_G1b0wV5I_001015_001035',
    'total_frames': 61
    'label': {
        'concept': [250, 131, 42, 51, 57, 155, 122],
        'object': [1570, 508],
        'event': [16],
        'action': [180],
        'scene': [206]
    }
}

参数

ann_file (str) – Path to the annotation file, should be a json file.
pipeline (list[dict | callable]) – A sequence of data transforms.
tag_categories (list[str]) – List of category names of tags.
tag_category_nums (list[int]) – List of number of tags in each category.
filename_tmpl (str | None) – Template for each filename. If set to None, video dataset is used. Default: None.
**kwargs – Keyword arguments for BaseDataset.

evaluate(results, metrics='mean_average_precision', metric_options=None, logger=None)[源代码]¶

Evaluation in HVU Video Dataset. We only support evaluating mAP for each tag categories. Since some tag categories are missing for some videos, we can not evaluate mAP for all tags.

参数

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘mean_average_precision’.
metric_options (dict | None) – Dict for metric options. Default: None.
logger (logging.Logger | None) – Logger for recording. Default: None.

返回

Evaluation results dict.

返回类型

dict

load_annotations()[源代码]¶: Load annotation file to get video information.

load_json_annotations()[源代码]¶: Load json annotation file to get video information.

class mmaction.datasets.ImageDataset(ann_file, pipeline, **kwargs)[源代码]¶

Image dataset for action recognition, used in the Project OmniSource.

The dataset loads image list and apply specified transforms to return a dict containing the image tensors and other information. For the ImageDataset

The ann_file is a text file with multiple lines, and each line indicates the image path and the image label, which are split with a whitespace. Example of a annotation file:

path/to/image1.jpg 1
path/to/image2.jpg 1
path/to/image3.jpg 2
path/to/image4.jpg 2
path/to/image5.jpg 3
path/to/image6.jpg 3

Example of a multi-class annotation file:

path/to/image1.jpg 1 3 5
path/to/image2.jpg 1 2
path/to/image3.jpg 2
path/to/image4.jpg 2 4 6 8
path/to/image5.jpg 3
path/to/image6.jpg 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
**kwargs – Keyword arguments for BaseDataset.

class mmaction.datasets.MixupBlending(num_classes, alpha=0.2)[源代码]¶

Implementing Mixup in a mini-batch.

This module is proposed in mixup: Beyond Empirical Risk Minimization. Code Reference https://github.com/open-mmlab/mmclassification/blob/master/mmcls/models/utils/mixup.py # noqa

参数

num_classes (int) – The number of classes.
alpha (float) – Parameters for Beta distribution.

do_blending(imgs, label, **kwargs)[源代码]¶: Blending images with mixup.

class mmaction.datasets.RawVideoDataset(ann_file, pipeline, clipname_tmpl='part_{}.mp4', sampling_strategy='positive', **kwargs)[源代码]¶

RawVideo dataset for action recognition, used in the Project OmniSource.

The dataset loads clips of raw videos and apply specified transforms to return a dict containing the frame tensors and other information. Not that for this dataset, multi_class should be False.

The ann_file is a text file with multiple lines, and each line indicates a sample video with the filepath (without suffix), label, number of clips and index of positive clips (starting from 0), which are split with a whitespace. Raw videos should be first trimmed into 10 second clips, organized in the following format:

some/path/D32_1gwq35E/part_0.mp4
some/path/D32_1gwq35E/part_1.mp4
......
some/path/D32_1gwq35E/part_n.mp4

Example of a annotation file:

some/path/D32_1gwq35E 66 10 0 1 2
some/path/-G-5CJ0JkKY 254 5 3 4
some/path/T4h1bvOd9DA 33 1 0
some/path/4uZ27ivBl00 341 2 0 1
some/path/0LfESFkfBSw 186 234 7 9 11
some/path/-YIsNpBEx6c 169 100 9 10 11

The first line indicates that the raw video some/path/D32_1gwq35E has action label 66, consists of 10 clips (from part_0.mp4 to part_9.mp4). The 1st, 2nd and 3rd clips are positive clips.

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
sampling_strategy (str) – The strategy to sample clips from raw videos. Choices are ‘random’ or ‘positive’. Default: ‘positive’.
clipname_tmpl (str) – The template of clip name in the raw video. Default: ‘part_{}.mp4’.
**kwargs – Keyword arguments for BaseDataset.

load_annotations()[源代码]¶: Load annotation file to get video information.

load_json_annotations()[源代码]¶: Load json annotation file to get video information.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

sample_clip(results)[源代码]¶: Sample a clip from the raw video given the sampling strategy.

class mmaction.datasets.RawframeDataset(ann_file, pipeline, data_prefix=None, test_mode=False, filename_tmpl='img_{:05}.jpg', with_offset=False, multi_class=False, num_classes=None, start_index=1, modality='RGB', sample_by_class=False, power=None)[源代码]¶

Rawframe dataset for action recognition.

The dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Example of a annotation file:

some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3

Example of a multi-class annotation file:

some/directory-1 163 1 3 5
some/directory-2 122 1 2
some/directory-3 258 2
some/directory-4 234 2 4 6 8
some/directory-5 295 3
some/directory-6 121 3

Example of a with_offset annotation file (clips from long videos), each line indicates the directory to frames of a video, the index of the start frame, total frames of the video clip and the label of a video clip, which are split with a whitespace.

some/directory-1 12 163 3
some/directory-2 213 122 4
some/directory-3 100 258 5
some/directory-4 98 234 2
some/directory-5 0 295 3
some/directory-6 50 121 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str | None) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
with_offset (bool) – Determines whether the offset information is in ann_file. Default: False.
multi_class (bool) – Determines whether it is a multi-class recognition dataset. Default: False.
num_classes (int | None) – Number of classes in the dataset. Default: None.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
sample_by_class (bool) – Sampling by class, should be set True when performing inter-class data balancing. Only compatible with multi_class == False. Only applies for training. Default: False.
power (float | None) – We support sampling data with the probability proportional to the power of its label frequency (freq ^ power) when sampling data. power == 1 indicates uniformly sampling all data; power == 0 indicates uniformly sampling all classes. Default: None.

load_annotations()[源代码]¶: Load annotation file to get video information.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

class mmaction.datasets.RepeatDataset(dataset, times)[源代码]¶

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

参数

dataset (Dataset) – The dataset to be repeated.
times (int) – Repeat times.

class mmaction.datasets.SSNDataset(ann_file, pipeline, train_cfg, test_cfg, data_prefix, test_mode=False, filename_tmpl='img_{:05d}.jpg', start_index=1, modality='RGB', video_centric=True, reg_normalize_constants=None, body_segments=5, aug_segments=(2, 2), aug_ratio=(0.5, 0.5), clip_len=1, frame_interval=1, filter_gt=True, use_regression=True, verbose=False)[源代码]¶

Proposal frame dataset for Structured Segment Networks.

Based on proposal information, the dataset loads raw frames and applies specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines and each video’s information takes up several lines. This file can be a normalized file with percent or standard file with specific frame indexes. If the file is a normalized file, it will be converted into a standard file first.

Template information of a video in a standard file: .. code-block:: txt

# index video_id num_frames fps num_gts label, start_frame, end_frame label, start_frame, end_frame … num_proposals label, best_iou, overlap_self, start_frame, end_frame label, best_iou, overlap_self, start_frame, end_frame …

Example of a standard annotation file: .. code-block:: txt

# 0 video_validation_0000202 5666 1 3 8 130 185 8 832 1136 8 1303 1381 5 8 0.0620 0.0620 790 5671 8 0.1656 0.1656 790 2619 8 0.0833 0.0833 3945 5671 8 0.0960 0.0960 4173 5671 8 0.0614 0.0614 3327 5671

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
data_prefix (str) – Path to a directory where videos are held.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
start_index (int) – Specify a start index for frames in consideration of different filename format. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
video_centric (bool) – Whether to sample proposals just from this video or sample proposals randomly from the entire dataset. Default: True.
reg_normalize_constants (list) – Regression target normalized constants, including mean and standard deviation of location and duration.
body_segments (int) – Number of segments in course period. Default: 5.
aug_segments (list[int]) – Number of segments in starting and ending period. Default: (2, 2).
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal. Defualt: (0.5, 0.5).
clip_len (int) – Frames of each sampled output clip. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
filter_gt (bool) – Whether to filter videos with no annotation during training. Default: True.
use_regression (bool) – Whether to perform regression. Default: True.
verbose (bool) – Whether to print full information or not. Default: False.

construct_proposal_pools()[源代码]¶: Construct positve proposal pool, incomplete proposal pool and background proposal pool of the entire dataset.

evaluate(results, metrics='mAP', metric_options={'mAP': {'eval_dataset': 'thumos14'}}, logger=None, **deprecated_kwargs)[源代码]¶

Evaluation in SSN proposal dataset.

参数

results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘mAP’.
metric_options (dict) – Dict for metric options. Options are eval_dataset for mAP. Default: dict(mAP=dict(eval_dataset='thumos14')).
logger (logging.Logger | None) – Logger for recording. Default: None.
deprecated_kwargs (dict) – Used for containing deprecated arguments. See ‘https://github.com/open-mmlab/mmaction2/pull/286’.

返回

Evaluation results for evaluation metrics.

返回类型

dict

get_all_gts()[源代码]¶: Fetch groundtruth instances of the entire dataset.

static get_negatives(proposals, incomplete_iou_threshold, background_iou_threshold, background_coverage_threshold=0.01, incomplete_overlap_threshold=0.7)[源代码]¶

Get negative proposals, including incomplete proposals and background proposals.

参数

proposals (list) – List of proposal instances(SSNInstance).
incomplete_iou_threshold (float) – Maximum threshold of overlap of incomplete proposals and groundtruths.
background_iou_threshold (float) – Maximum threshold of overlap of background proposals and groundtruths.
background_coverage_threshold (float) – Minimum coverage of background proposals in video duration. Default: 0.01.
incomplete_overlap_threshold (float) – Minimum percent of incomplete proposals’ own span contained in a groundtruth instance. Default: 0.7.

返回

(incompletes, backgrounds), incompletes: and backgrounds are lists comprised of incomplete proposal instances and background proposal instances.

返回类型

list[SSNInstance]

static get_positives(gts, proposals, positive_threshold, with_gt=True)[源代码]¶

Get positive/foreground proposals.

参数

gts (list) – List of groundtruth instances(SSNInstance).
proposals (list) – List of proposal instances(SSNInstance).
positive_threshold (float) – Minimum threshold of overlap of positive/foreground proposals and groundtruths.
with_gt (bool) – Whether to include groundtruth instances in positive proposals. Default: True.

返回

(positives), positives is a list: comprised of positive proposal instances.

返回类型

list[SSNInstance]

load_annotations()[源代码]¶: Load annotation file to get video information.

prepare_test_frames(idx)[源代码]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[源代码]¶: Prepare the frames for training given the index.

results_to_detections(results, top_k=2000, **kwargs)[源代码]¶

Convert prediction results into detections.

参数

results (list) – Prediction results.
top_k (int) – Number of top results. Default: 2000.

返回

Detection results.

返回类型

list

class mmaction.datasets.VideoDataset(ann_file, pipeline, start_index=0, **kwargs)[源代码]¶

Video dataset for action recognition.

The dataset loads raw videos and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates a sample video with the filepath and label, which are split with a whitespace. Example of a annotation file:

some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3

参数

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 0.
**kwargs – Keyword arguments for BaseDataset.

load_annotations()[源代码]¶: Load annotation file to get video information.

mmaction.datasets.build_dataloader(dataset, videos_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[源代码]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

参数

dataset (Dataset) – A PyTorch dataset.
videos_per_gpu (int) – Number of videos on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int | None) – Seed to be used. Default: None.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.

返回

A PyTorch dataloader.

返回类型

DataLoader

mmaction.datasets.build_dataset(cfg, default_args=None)[源代码]¶

Build a dataset from config dict.

参数

cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict | None, optional) – Default initialization arguments. Default: None.

返回

The constructed dataset.

返回类型

Dataset

pipelines¶

class mmaction.datasets.pipelines.AudioAmplify(ratio)[源代码]¶

Amplify the waveform.

Required keys are “audios”, added or modified keys are “audios”, “amplify_ratio”.

参数: ratio (float) – The ratio used to amplify the audio waveform.

class mmaction.datasets.pipelines.AudioDecode(fixed_length=32000)[源代码]¶

Sample the audio w.r.t. the frames selected.

参数: fixed_length (int) – As the audio clip selected by frames sampled may not be exactly the same, fixed_length will truncate or pad them into the same size. Default: 32000.

Required keys are “frame_inds”, “num_clips”, “total_frames”, “length”, added or modified keys are “audios”, “audios_shape”.

class mmaction.datasets.pipelines.AudioDecodeInit(io_backend='disk', sample_rate=16000, pad_method='zero', **kwargs)[源代码]¶

Using librosa to initialize the audio reader.

Required keys are “audio_path”, added or modified keys are “length”, “sample_rate”, “audios”.

参数

io_backend (str) – io backend where frames are store. Default: ‘disk’.
sample_rate (int) – Audio sampling times per second. Default: 16000.

class mmaction.datasets.pipelines.AudioFeatureSelector(fixed_length=128)[源代码]¶

Sample the audio feature w.r.t. the frames selected.

Required keys are “audios”, “frame_inds”, “num_clips”, “length”, “total_frames”, added or modified keys are “audios”, “audios_shape”.

参数: fixed_length (int) – As the features selected by frames sampled may not be extactly the same, fixed_length will truncate or pad them into the same size. Default: 128.

class mmaction.datasets.pipelines.BuildPseudoClip(clip_len)[源代码]¶

Build pseudo clips with one single image by repeating it n times.

Required key is “imgs”, added or modified key is “imgs”, “num_clips”,: “clip_len”.

参数: clip_len (int) – Frames of the generated pseudo clips.

class mmaction.datasets.pipelines.CenterCrop(crop_size, lazy=False)[源代码]¶

Crop the center area from images.

Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “lazy” and “img_shape”. Required keys in “lazy” is “crop_bbox”, added or modified key is “crop_bbox”.

参数

crop_size (int | tuple[int]) – (w, h) of crop size.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Collect(keys, meta_keys=('filename', 'label', 'original_shape', 'img_shape', 'pad_shape', 'flip_direction', 'img_norm_cfg'), meta_name='img_metas', nested=False)[源代码]¶

Collect data from the loader relevant to the specific task.

This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_metas’, the results will be a dict with keys ‘imgs’ and ‘img_metas’, where ‘img_metas’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.

参数

keys (Sequence[str]) – Required keys to be collected.
meta_name (str) – The name of the key that contains meta infomation. This key is always populated. Default: “img_metas”.
meta_keys (Sequence[str]) –
Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys. By default this includes:
- ”filename”: path to the image file
- ”label”: label of the image file
- ”original_shape”: original shape of the image as a tuple
  (h, w, c)
- ”img_shape”: shape of the image input to the network as a tuple
  (h, w, c). Note that images may be zero padded on the bottom/right, if the batch tensor is larger than this shape.
- ”pad_shape”: image shape after padding
- ”flip_direction”: a str in (“horiziontal”, “vertival”) to
  indicate if the image is fliped horizontally or vertically.
- ”img_norm_cfg”: a dict of normalization information:
  - mean - per channel mean subtraction
  - std - per channel std divisor
  - to_rgb - bool indicating if bgr was converted to rgb
nested (bool) – If set as True, will apply data[x] = [data[x]] to all items in data. The arg is added for compatibility. Default: False.

class mmaction.datasets.pipelines.ColorJitter(color_space_aug=False, alpha_std=0.1, eig_val=None, eig_vec=None)[源代码]¶

Randomly distort the brightness, contrast, saturation and hue of images, and add PCA based noise into images.

Note: The input images should be in RGB channel order.

Code Reference: https://gluon-cv.mxnet.io/_modules/gluoncv/data/transforms/experimental/image.html https://mxnet.apache.org/api/python/docs/_modules/mxnet/image/image.html#LightingAug

If specified to apply color space augmentation, it will distort the image color space by changing brightness, contrast and saturation. Then, it will add some random distort to the images in different color channels. Note that the input images should be in original range [0, 255] and in RGB channel sequence.

Required keys are “imgs”, added or modified keys are “imgs”, “eig_val”, “eig_vec”, “alpha_std” and “color_space_aug”.

参数

color_space_aug (bool) – Whether to apply color space augmentations. If specified, it will change the brightness, contrast, saturation and hue of images, then add PCA based noise to images. Otherwise, it will directly add PCA based noise to images. Default: False.
alpha_std (float) – Std in the normal Gaussian distribution of alpha.
eig_val (np.ndarray | None) – Eigenvalues of [1 x 3] size for RGB channel jitter. If set to None, it will use the default eigenvalues. Default: None.
eig_vec (np.ndarray | None) – Eigenvectors of [3 x 3] size for RGB channel jitter. If set to None, it will use the default eigenvectors. Default: None.

static brightness(img, delta)[源代码]¶

Brightness distortion.

参数

img (np.ndarray) – An input image.
delta (float) – Delta value to distort brightness. It ranges from [-32, 32).

返回

A brightness distorted image.

返回类型

np.ndarray

static contrast(img, alpha)[源代码]¶

Contrast distortion.

参数

img (np.ndarray) – An input image.
alpha (float) – Alpha value to distort contrast. It ranges from [0.6, 1.4).

返回

A contrast distorted image.

返回类型

np.ndarray

static hue(img, alpha)[源代码]¶

Hue distortion.

参数

img (np.ndarray) – An input image.
alpha (float) – Alpha value to control the degree of rotation for hue. It ranges from [-18, 18).

返回

A hue distorted image.

返回类型

np.ndarray

static saturation(img, alpha)[源代码]¶

Saturation distortion.

参数

img (np.ndarray) – An input image.
alpha (float) – Alpha value to distort the saturation. It ranges from [0.6, 1.4).

返回

A saturation distorted image.

返回类型

np.ndarray

class mmaction.datasets.pipelines.Compose(transforms)[源代码]¶

Compose a data pipeline with a sequence of transforms.

参数: transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmaction.datasets.pipelines.DecordDecode[源代码]¶

Using decord to decode the video.

Decord: https://github.com/dmlc/decord

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs” and “original_shape”.

class mmaction.datasets.pipelines.DecordInit(io_backend='disk', num_threads=1, **kwargs)[源代码]¶

Using decord to initialize the video_reader.

Decord: https://github.com/dmlc/decord

Required keys are “filename”, added or modified keys are “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.DenseSampleFrames(clip_len, frame_interval=1, num_clips=1, sample_range=64, num_sample_positions=10, temporal_jitter=False, out_of_bound_opt='loop', test_mode=False)[源代码]¶

Select frames from the video by dense sample strategy.

Required keys are “filename”, added or modified keys are “total_frames”, “frame_inds”, “frame_interval” and “num_clips”.

参数

clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
sample_range (int) – Total sample range for dense sample. Default: 64.
num_sample_positions (int) – Number of sample start positions, Which is only used in test mode. Default: 10. That is to say, by default, there are at least 10 clips for one input sample in test mode.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmaction.datasets.pipelines.Flip(flip_ratio=0.5, direction='horizontal', flip_label_map=None, lazy=False)[源代码]¶

Flip the input images with a probability.

Reverse the order of elements in the given imgs with a specific direction. The shape of the imgs is preserved, but the elements are reordered. Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “lazy” and “flip_direction”. Required keys in “lazy” is None, added or modified key are “flip” and “flip_direction”. The Flip augmentation should be placed after any cropping / reshaping augmentations, to make sure crop_quadruple is calculated properly.

参数

flip_ratio (float) – Probability of implementing flip. Default: 0.5.
direction (str) – Flip imgs horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.
flip_label_map (Dict[int, int] | None) – Transform the label of the flipped image with the specific label. Default: None.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.FormatAudioShape(input_format)[源代码]¶

Format final audio shape to the given input_format.

Required keys are “imgs”, “num_clips” and “clip_len”, added or modified keys are “imgs” and “input_shape”.

参数: input_format (str) – Define the final imgs format.

class mmaction.datasets.pipelines.FormatShape(input_format, collapse=False)[源代码]¶

Format final imgs shape to the given input_format.

Required keys are “imgs”, “num_clips” and “clip_len”, added or modified keys are “imgs” and “input_shape”.

参数

input_format (str) – Define the final imgs format.
collapse (bool) – To collpase input_format N… to … (NCTHW to CTHW, etc.) if N is 1. Should be set as True when training and testing detectors. Default: False.

class mmaction.datasets.pipelines.FrameSelector(*args, **kwargs)[源代码]¶: Deprecated class for RawFrameDecode.

class mmaction.datasets.pipelines.Fuse[源代码]¶

Fuse lazy operations.

Fusion order:: crop -> resize -> flip

Required keys are “imgs”, “img_shape” and “lazy”, added or modified keys are “imgs”, “lazy”. Required keys in “lazy” are “crop_bbox”, “interpolation”, “flip_direction”.

class mmaction.datasets.pipelines.GenerateLocalizationLabels[源代码]¶

Load video label for localizer with given video_name list.

Required keys are “duration_frame”, “duration_second”, “feature_frame”, “annotations”, added or modified keys are “gt_bbox”.

class mmaction.datasets.pipelines.ImageDecode(io_backend='disk', decoding_backend='cv2', **kwargs)[源代码]¶

Load and decode images.

Required key is “filename”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

参数

io_backend (str) – IO backend where frames are stored. Default: ‘disk’.
decoding_backend (str) – Backend used for image decoding. Default: ‘cv2’.
kwargs (dict, optional) – Arguments for FileClient.

class mmaction.datasets.pipelines.ImageToTensor(keys)[源代码]¶

Convert image type to torch.Tensor type.

参数: keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.Imgaug(transforms)[源代码]¶

Imgaug augmentation.

Adds custom transformations from imgaug library. Please visit https://imgaug.readthedocs.io/en/latest/index.html to get more information. Two demo configs could be found in tsn and i3d config folder.

It’s better to use uint8 images as inputs since imgaug works best with numpy dtype uint8 and isn’t well tested with other dtypes. It should be noted that not all of the augmenters have the same input and output dtype, which may cause unexpected results.

Required keys are “imgs”, “img_shape”(if “gt_bboxes” is not None) and “modality”, added or modified keys are “imgs”, “img_shape”, “gt_bboxes” and “proposals”.

It is worth mentioning that Imgaug will NOT create custom keys like “interpolation”, “crop_bbox”, “flip_direction”, etc. So when using Imgaug along with other mmaction2 pipelines, we should pay more attention to required keys.

Two steps to use Imgaug pipeline: 1. Create initialization parameter transforms. There are three ways

to create transforms. 1) string: only support default for now.

e.g. transforms=’default’

list[dict]: create a list of augmenters by a list of dicts, each
dict corresponds to one augmenter. Every dict MUST contain a key named type. type should be a string(iaa.Augmenter’s name) or an iaa.Augmenter subclass. e.g. transforms=[dict(type=’Rotate’, rotate=(-20, 20))] e.g. transforms=[dict(type=iaa.Rotate, rotate=(-20, 20))]

iaa.Augmenter: create an imgaug.Augmenter object.
e.g. transforms=iaa.Rotate(rotate=(-20, 20))

Add Imgaug in dataset pipeline. It is recommended to insert imgaug
pipeline before Normalize. A demo pipeline is listed as follows. ``` pipeline = [

dict(
type=’SampleFrames’, clip_len=1, frame_interval=1, num_clips=16,

), dict(type=’RawFrameDecode’), dict(type=’Resize’, scale=(-1, 256)), dict(

type=’MultiScaleCrop’, input_size=224, scales=(1, 0.875, 0.75, 0.66), random_crop=False, max_wh_scale_gap=1, num_fixed_crops=13),

dict(type=’Resize’, scale=(224, 224), keep_ratio=False), dict(type=’Flip’, flip_ratio=0.5), dict(type=’Imgaug’, transforms=’default’), # dict(type=’Imgaug’, transforms=[ # dict(type=’Rotate’, rotate=(-20, 20)) # ]), dict(type=’Normalize’, **img_norm_cfg), dict(type=’FormatShape’, input_format=’NCHW’), dict(type=’Collect’, keys=[‘imgs’, ‘label’], meta_keys=[]), dict(type=’ToTensor’, keys=[‘imgs’, ‘label’])

参数: transforms (str | list[dict] | iaa.Augmenter) – Three different ways to create imgaug augmenter.

default_transforms()[源代码]¶

Default transforms for imgaug.

Implement RandAugment by imgaug. Plase visit https://arxiv.org/abs/1909.13719 for more information.

Augmenters and hyper parameters are borrowed from the following repo: https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/autoaugment.py # noqa

Miss one augmenter SolarizeAdd since imgaug doesn’t support this.

返回: The constructed RandAugment transforms.
返回类型: dict

imgaug_builder(cfg)[源代码]¶

Import a module from imgaug.

It follows the logic of build_from_cfg(). Use a dict object to create an iaa.Augmenter object.

参数: cfg (dict) – Config dict. It should at least contain the key “type”.
返回: iaa.Augmenter: The constructed imgaug augmenter.
返回类型: obj

class mmaction.datasets.pipelines.LoadAudioFeature(pad_method='zero')[源代码]¶

Load offline extracted audio features.

Required keys are “audio_path”, added or modified keys are “length”, audios”.

class mmaction.datasets.pipelines.LoadHVULabel(**kwargs)[源代码]¶

Convert the HVU label from dictionaries to torch tensors.

Required keys are “label”, “categories”, “category_nums”, added or modified keys are “label”, “mask” and “category_mask”.

class mmaction.datasets.pipelines.LoadLocalizationFeature(raw_feature_ext='.csv')[源代码]¶

Load Video features for localizer with given video_name list.

Required keys are “video_name” and “data_prefix”, added or modified keys are “raw_feature”.

参数: raw_feature_ext (str) – Raw feature file extension. Default: ‘.csv’.

class mmaction.datasets.pipelines.LoadProposals(top_k, pgm_proposals_dir, pgm_features_dir, proposal_ext='.csv', feature_ext='.npy')[源代码]¶

Loading proposals with given proposal results.

Required keys are “video_name”, added or modified keys are ‘bsp_feature’, ‘tmin’, ‘tmax’, ‘tmin_score’, ‘tmax_score’ and ‘reference_temporal_iou’.

参数

top_k (int) – The top k proposals to be loaded.
pgm_proposals_dir (str) – Directory to load proposals.
pgm_features_dir (str) – Directory to load proposal features.
proposal_ext (str) – Proposal file extension. Default: ‘.csv’.
feature_ext (str) – Feature file extension. Default: ‘.npy’.

class mmaction.datasets.pipelines.MelSpectrogram(window_size=32, step_size=16, n_mels=80, fixed_length=128)[源代码]¶

MelSpectrogram. Transfer an audio wave into a melspectogram figure.

Required keys are “audios”, “sample_rate”, “num_clips”, added or modified keys are “audios”.

参数

window_size (int) – The window size in milisecond. Default: 32.
step_size (int) – The step size in milisecond. Default: 16.
n_mels (int) – Number of mels. Default: 80.
fixed_length (int) – The sample length of melspectrogram maybe not exactly as wished due to different fps, fix the length for batch collation by truncating or padding. Default: 128.

class mmaction.datasets.pipelines.MultiGroupCrop(crop_size, groups)[源代码]¶

Randomly crop the images into several groups.

Crop the random region with the same given crop_size and bounding box into several groups. Required keys are “imgs”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

参数

crop_size (int | tuple[int]) – (w, h) of crop size.
groups (int) – Number of groups.

class mmaction.datasets.pipelines.MultiScaleCrop(input_size, scales=(1), max_wh_scale_gap=1, random_crop=False, num_fixed_crops=5, lazy=False)[源代码]¶

Crop images with a list of randomly selected scales.

Randomly select the w and h scales from a list of scales. Scale of 1 means the base size, which is the minimal of image width and height. The scale level of w and h is controlled to be smaller than a certain value to prevent too large or small aspect ratio. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “img_shape”, “lazy” and “scales”. Required keys in “lazy” are “crop_bbox”, added or modified key is “crop_bbox”.

参数

input_size (int | tuple[int]) – (w, h) of network input.
scales (tuple[float]) – width and height scales to be selected.
max_wh_scale_gap (int) – Maximum gap of w and h scale levels. Default: 1.
random_crop (bool) – If set to True, the cropping bbox will be randomly sampled, otherwise it will be sampler from fixed regions. Default: False.
num_fixed_crops (int) – If set to 5, the cropping bbox will keep 5 basic fixed regions: “upper left”, “upper right”, “lower left”, “lower right”, “center”. If set to 13, the cropping bbox will append another 8 fix regions: “center left”, “center right”, “lower center”, “upper center”, “upper left quarter”, “upper right quarter”, “lower left quarter”, “lower right quarter”. Default: 5.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Normalize(mean, std, to_bgr=False, adjust_magnitude=False)[源代码]¶

Normalize images with the given mean and std value.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs” and “img_norm_cfg”. If modality is ‘Flow’, additional keys “scale_factor” is required

参数

mean (Sequence[float]) – Mean values of different channels.
std (Sequence[float]) – Std values of different channels.
to_bgr (bool) – Whether to convert channels from RGB to BGR. Default: False.
adjust_magnitude (bool) – Indicate whether to adjust the flow magnitude on ‘scale_factor’ when modality is ‘Flow’. Default: False.

class mmaction.datasets.pipelines.OpenCVDecode[源代码]¶

Using OpenCV to decode the video.

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

class mmaction.datasets.pipelines.OpenCVInit(io_backend='disk', **kwargs)[源代码]¶

Using OpenCV to initialize the video_reader.

Required keys are “filename”, added or modified keys are “new_path”, “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.PyAVDecode(multi_thread=False)[源代码]¶

Using pyav to decode the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “video_reader” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

参数: multi_thread (bool) – If set to True, it will apply multi thread processing. Default: False.

class mmaction.datasets.pipelines.PyAVDecodeMotionVector(multi_thread=False)[源代码]¶

Using pyav to decode the motion vectors from video.

Reference: https://github.com/PyAV-Org/PyAV/: blob/main/tests/test_decode.py

Required keys are “video_reader” and “frame_inds”, added or modified keys are “motion_vectors”, “frame_inds”.

参数: multi_thread (bool) – If set to True, it will apply multi thread processing. Default: False.

class mmaction.datasets.pipelines.PyAVInit(io_backend='disk', **kwargs)[源代码]¶

Using pyav to initialize the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “filename”, added or modified keys are “video_reader”, and “total_frames”.

参数

io_backend (str) – io backend where frames are store. Default: ‘disk’.
kwargs (dict) – Args for file client.

class mmaction.datasets.pipelines.RandomCrop(size, lazy=False)[源代码]¶

Vanilla square random crop that specifics the output size.

Required keys in results are “imgs” and “img_shape”, added or modified keys are “imgs”, “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

参数

size (int) – The output size of the images.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.RandomRescale(scale_range, interpolation='bilinear')[源代码]¶

Randomly resize images so that the short_edge is resized to a specific size in a given range. The scale ratio is unchanged after resizing.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “resize_size”, “short_edge”.

参数

scale_range (tuple[int]) – The range of short edge length. A closed interval.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear”. Default: “bilinear”.

class mmaction.datasets.pipelines.RandomResizedCrop(area_range=(0.08, 1.0), aspect_ratio_range=(0.75, 1.3333333333333333), lazy=False)[源代码]¶

Random crop that specifics the area and height-weight ratio range.

Required keys in results are “imgs”, “img_shape”, “crop_bbox” and “lazy”, added or modified keys are “imgs”, “crop_bbox” and “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

参数

area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3).
lazy (bool) – Determine whether to apply lazy operation. Default: False.

static get_crop_bbox(img_shape, area_range, aspect_ratio_range, max_attempts=10)[源代码]¶

Get a crop bbox given the area range and aspect ratio range.

参数

img_shape (Tuple[int]) – Image shape
area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3). max_attempts (int): The maximum of attempts. Default: 10.
max_attempts (int) – Max attempts times to generate random candidate bounding box. If it doesn’t qualified one, the center bounding box will be used.

返回

(list[int]) A random crop bbox within the area range and aspect ratio range.

class mmaction.datasets.pipelines.RandomScale(scales, mode='range', **kwargs)[源代码]¶

Resize images by a random scale.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “lazy”, “scale”, “resize_size”. Required keys in “lazy” is None, added or modified key is “interpolation”.

参数

scales (tuple[int]) – Tuple of scales to be chosen for resize.
mode (str) – Selection mode for choosing the scale. Options are “range” and “value”. If set to “range”, The short edge will be randomly chosen from the range of minimum and maximum on the shorter one in all tuples. Otherwise, the longer edge will be randomly chosen from the range of minimum and maximum on the longer one in all tuples. Default: ‘range’.

class mmaction.datasets.pipelines.RawFrameDecode(io_backend='disk', decoding_backend='cv2', **kwargs)[源代码]¶

Load and decode frames with given indices.

Required keys are “frame_dir”, “filename_tmpl” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

参数

io_backend (str) – IO backend where frames are stored. Default: ‘disk’.
decoding_backend (str) – Backend used for image decoding. Default: ‘cv2’.
kwargs (dict, optional) – Arguments for FileClient.

class mmaction.datasets.pipelines.Rename(mapping)[源代码]¶

Rename the key in results.

参数: mapping (dict) – The keys in results that need to be renamed. The key of the dict is the original name, while the value is the new name. If the original name not found in results, do nothing. Default: dict().

class mmaction.datasets.pipelines.Resize(scale, keep_ratio=True, interpolation='bilinear', lazy=False)[源代码]¶

Resize images to a specific size.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “lazy”, “resize_size”. Required keys in “lazy” is None, added or modified key is “interpolation”.

参数

scale (float | Tuple[int]) – If keep_ratio is True, it serves as scaling factor or maximum size: If it is a float number, the image will be rescaled by this factor, else if it is a tuple of 2 integers, the image will be rescaled as large as possible within the scale. Otherwise, it serves as (w, h) of output size.
keep_ratio (bool) – If set to True, Images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: True.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear”. Default: “bilinear”.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.SampleAVAFrames(clip_len, frame_interval=2, test_mode=False)[源代码]¶

class mmaction.datasets.pipelines.SampleFrames(clip_len, frame_interval=1, num_clips=1, temporal_jitter=False, twice_sample=False, out_of_bound_opt='loop', test_mode=False, start_index=None)[源代码]¶

Sample frames from the video.

Required keys are “filename”, “total_frames”, “start_index” , added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

参数

clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
twice_sample (bool) – Whether to use twice sample when testing. If set to True, it will sample frames with and without fixed shift, which is commonly used for testing in TSM model. Default: False.
out_of_bound_opt (str) – The way to deal with out of bounds frame indexes. Available options are ‘loop’, ‘repeat_last’. Default: ‘loop’.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
start_index (None) – This argument is deprecated and moved to dataset class (BaseDataset, VideoDatset, RawframeDataset, etc), see this: https://github.com/open-mmlab/mmaction2/pull/89.

class mmaction.datasets.pipelines.SampleProposalFrames(clip_len, body_segments, aug_segments, aug_ratio, frame_interval=1, test_interval=6, temporal_jitter=False, mode='train')[源代码]¶

Sample frames from proposals in the video.

Required keys are “total_frames” and “out_proposals”, added or modified keys are “frame_inds”, “frame_interval”, “num_clips”, ‘clip_len’ and ‘num_proposals’.

参数

clip_len (int) – Frames of each sampled output clip.
body_segments (int) – Number of segments in course period.
aug_segments (list[int]) – Number of segments in starting and ending period.
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
test_interval (int) – Temporal interval of adjacent sampled frames in test mode. Default: 6.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
mode (str) – Choose ‘train’, ‘val’ or ‘test’ mode. Default: ‘train’.

class mmaction.datasets.pipelines.TenCrop(crop_size)[源代码]¶

Crop the images into 10 crops (corner + center + flip).

Crop the four corners and the center part of the image with the same given crop_size, and flip it horizontally. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

参数: crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ThreeCrop(crop_size)[源代码]¶

Crop images into three crops.

Crop the images equally into three crops with equal intervals along the shorter side. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

参数: crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ToDataContainer(fields)[源代码]¶

Convert the data to DataContainer.

参数: fields (Sequence[dict]) – Required fields to be converted with keys and attributes. E.g. fields=(dict(key=’gt_bbox’, stack=False),). Note that key can also be a list of keys, if so, every tensor in the list will be converted to DataContainer.

class mmaction.datasets.pipelines.ToTensor(keys)[源代码]¶

Convert some values in results dict to torch.Tensor type in data loader pipeline.

参数: keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.Transpose(keys, order)[源代码]¶

Transpose image channels to a given order.

参数

keys (Sequence[str]) – Required keys to be converted.
order (Sequence[int]) – Image channel order.

class mmaction.datasets.pipelines.UntrimmedSampleFrames(clip_len=1, frame_interval=16, start_index=None)[源代码]¶

Sample frames from the untrimmed video.

Required keys are “filename”, “total_frames”, added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

参数

clip_len (int) – The length of sampled clips. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 16.
start_index (None) – This argument is deprecated and moved to dataset class (BaseDataset, VideoDatset, RawframeDataset, etc), see this: https://github.com/open-mmlab/mmaction2/pull/89.

samplers¶

class mmaction.datasets.samplers.DistributedPowerSampler(dataset, num_replicas=None, rank=None, power=1, seed=0)[源代码]¶

DistributedPowerSampler inheriting from torch.utils.data.DistributedSampler.

Samples are sampled with the probability that is proportional to the power of label frequency (freq ^ power). The sampler only applies to single class recognition dataset.

The default value of power is 1, which is equivalent to bootstrap sampling from the entire dataset.

class mmaction.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0)[源代码]¶

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

mmaction.utils¶

class mmaction.utils.GradCAM(model, target_layer_name, colormap='viridis')[源代码]¶

GradCAM class helps create visualization results.

Visualization results are blended by heatmaps and input images. This class is modified from https://github.com/facebookresearch/SlowFast/blob/master/slowfast/visualization/gradcam_utils.py # noqa For more information about GradCAM, please visit: https://arxiv.org/pdf/1610.02391.pdf

class mmaction.utils.PreciseBNHook(dataloader, num_iters=200, interval=1)[源代码]¶

Precise BN hook.

dataloader¶

A PyTorch dataloader.

Type: DataLoader

num_iters¶

Number of iterations to update the bn stats. Default: 200.

Type: int

interval¶

Perform precise bn interval (by epochs). Default: 1.

Type: int

mmaction.utils.get_random_string(length=15)[源代码]¶

Get random string with letters and digits.

参数: length (int) – Length of random string. Default: 15.

mmaction.utils.get_root_logger(log_file=None, log_level=20)[源代码]¶

Use get_logger method in mmcv to get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmaction”.

参数

log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

返回

The root logger.

返回类型

logging.Logger

mmaction.utils.get_shm_dir()[源代码]¶: Get shm dir for temporary usage.

mmaction.utils.get_thread_id()[源代码]¶: Get current thread id.

mmaction.utils.import_module_error_class(module_name)[源代码]¶: When a class is imported incorrectly due to a missing module, raise an import error when the class is instantiated.

mmaction.utils.import_module_error_func(module_name)[源代码]¶: When a function is imported incorrectly due to a missing module, raise an import error when the function is called.

mmaction.localization¶

mmaction.localization.eval_ap(detections, gt_by_cls, iou_range)[源代码]¶

Evaluate average precisions.

参数

detections (dict) – Results of detections.
gt_by_cls (dict) – Information of groudtruth.
iou_range (list) – Ranges of iou.

返回

Average precision values of classes at ious.

返回类型

list

mmaction.localization.generate_bsp_feature(video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=1000, bsp_boundary_ratio=0.2, num_sample_start=8, num_sample_end=8, num_sample_action=16, num_sample_interp=3, tem_results_ext='.csv', pgm_proposal_ext='.csv', result_dict=None)[源代码]¶

Generate Boundary-Sensitive Proposal Feature with given proposals.

参数

video_list (list[int]) – List of video indexs to generate bsp_feature.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’.
tem_results_dir (str) – Directory to load temporal evaluation results.
pgm_proposals_dir (str) – Directory to load proposals.
top_k (int) – Number of proposals to be considered. Default: 1000
bsp_boundary_ratio (float) – Ratio for proposal boundary (start/end). Default: 0.2.
num_sample_start (int) – Num of samples for actionness in start region. Default: 8.
num_sample_end (int) – Num of samples for actionness in end region. Default: 8.
num_sample_action (int) – Num of samples for actionness in center region. Default: 16.
num_sample_interp (int) – Num of samples for interpolation for each sample point. Default: 3.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
pgm_proposal_ext (str) – File extension for proposals. Default: ‘.csv’.
result_dict (dict | None) – The dict to save the results. Default: None.

返回

A dict contains video_name as keys and: bsp_feature as value. If result_dict is not None, save the results to it.

返回类型

bsp_feature_dict (dict)

mmaction.localization.generate_candidate_proposals(video_list, video_infos, tem_results_dir, temporal_scale, peak_threshold, tem_results_ext='.csv', result_dict=None)[源代码]¶

Generate Candidate Proposals with given temporal evalutation results. Each proposal file will contain: ‘tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa’.

参数

video_list (list[int]) – List of video indexs to generate proposals.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’, ‘duration_frame’, ‘duration_second’, ‘feature_frame’, and ‘annotations’.
tem_results_dir (str) – Directory to load temporal evaluation results.
temporal_scale (int) – The number (scale) on temporal axis.
peak_threshold (float) – The threshold for proposal generation.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
result_dict (dict | None) – The dict to save the results. Default: None.

返回

A dict contains video_name as keys and proposal list as value.: If result_dict is not None, save the results to it.

返回类型

dict

mmaction.localization.load_localize_proposal_file(filename)[源代码]¶

Load the proposal file and split it into many parts which contain one video’s information separately.

参数: filename (str) – Path to the proposal file.
返回: List of all videos’ information.
返回类型: list

mmaction.localization.perform_regression(detections)[源代码]¶

Perform regression on detection results.

参数: detections (list) – Detection results before regression.
返回: Detection results after regression.
返回类型: list

mmaction.localization.soft_nms(proposals, alpha, low_threshold, high_threshold, top_k)[源代码]¶

Soft NMS for temporal proposals.

参数

proposals (np.ndarray) – Proposals generated by network.
alpha (float) – Alpha value of Gaussian decaying function.
low_threshold (float) – Low threshold for soft nms.
high_threshold (float) – High threshold for soft nms.
top_k (int) – Top k values to be considered.

返回

The updated proposals.

返回类型

np.ndarray

mmaction.localization.temporal_iop(proposal_min, proposal_max, gt_min, gt_max)[源代码]¶

Compute IoP score between a groundtruth bbox and the proposals.

Compute the IoP which is defined as the overlap ratio with groundtruth proportional to the duration of this proposal.

参数

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

返回

List of intersection over anchor scores.

返回类型

list[float]

mmaction.localization.temporal_iou(proposal_min, proposal_max, gt_min, gt_max)[源代码]¶

Compute IoU score between a groundtruth bbox and the proposals.

参数

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

返回

List of iou scores.

返回类型

list[float]

mmaction.localization.temporal_nms(detections, threshold)[源代码]¶

Parse the video’s information.

参数

detections (list) – Detection results before NMS.
threshold (float) – Threshold of NMS.

返回

Detection results after NMS.

返回类型

list