API Reference

mmaction.core

optimizer

class mmaction.core.optimizer.CopyOfSGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]

A clone of torch.optim.SGD.

A customized optimizer could be defined like CopyOfSGD. You may derive from built-in optimizers in torch.optim, or directly implement a new optimizer.

class mmaction.core.optimizer.TSMOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]

Optimizer constructor in TSM model.

This constructor builds optimizer in different ways from the default one.

  1. Parameters of the first conv layer have default lr and weight decay.

  2. Parameters of BN layers have default lr and zero weight decay.

  3. If the field “fc_lr5” in paramwise_cfg is set to True, the parameters of the last fc layer in cls_head have 5x lr multiplier and 10x weight decay multiplier.

  4. Weights of other layers have default lr and weight decay, and biases have a 2x lr multiplier and zero weight decay.

add_params(params, model)[source]

Add parameters and their corresponding lr and wd to the params.

Parameters
  • params (list) – The list to be modified, containing all parameter groups and their corresponding lr and wd configurations.

  • model (nn.Module) – The model to be trained with the optimizer.

evaluation

class mmaction.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

  • save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.

  • key_indicator (str | None) – Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g., top1_acc, top5_acc, mean_class_accuracy, mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). Default: top1_acc.

  • rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.

  • eval_kwargs (dict, optional) – Arguments for evaluation.

after_train_epoch(runner)[source]

Called after each training epoch to evaluate the model.

class mmaction.core.evaluation.EvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader.

  • interval (int) – Evaluation interval (by epochs). Default: 1.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

  • save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.

  • key_indicator (str | None) –

    Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g.,

    top1_acc, top5_acc, mean_class_accuracy,

    mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). Default: top1_acc.

  • rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.

  • eval_kwargs (dict, optional) – Arguments for evaluation.

after_train_epoch(runner)[source]

Called after every training epoch to evaluate the results.

evaluate(runner, results)[source]

Evaluate the results.

Parameters
  • runner (mmcv.Runner) – The underlined training runner.

  • results (list) – Output results.

mmaction.core.evaluation.average_precision_at_temporal_iou(ground_truth, prediction, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]

Compute average precision (in detection task) between ground truth and predicted data frames. If multiple predictions match the same predicted segment, only the one with highest score is matched as true positive. This code is greatly inspired by Pascal VOC devkit.

Parameters
  • ground_truth (dict) – Dict containing the ground truth instances. Key: ‘video_id’ Value (np.ndarry): 1D array of ‘t-start’ and ‘t-end’.

  • proposals (np.ndarray) – 2D array containing the information of proposal instances, including ‘video_id’, ‘class_id’, ‘t-start’, ‘t-end’ and ‘score’.

  • temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

Returns

1D array of average precision score.

Return type

np.ndarray

mmaction.core.evaluation.average_recall_at_avg_proposals(ground_truth, proposals, total_num_proposals, max_avg_proposals=None, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]

Computes the average recall given an average number (percentile) of proposals per video.

Parameters
  • ground_truth (dict) – Dict containing the ground truth instances.

  • proposals (dict) – Dict containing the proposal instances.

  • total_num_proposals (int) – Total number of proposals in the proposal dict.

  • max_avg_proposals (int | None) – Max number of proposals for one video. Default: None.

  • temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

Returns

(recall, average_recall, proposals_per_video, auc) In recall, recall[i,j] is recall at i-th temporal_iou threshold at the j-th average number (percentile) of average number of proposals per video. The average_recall is recall averaged over a list of temporal_iou threshold (1D array). This is equivalent to recall.mean(axis=0). The proposals_per_video is the average number of proposals per video. The auc is the area under AR@AN curve.

Return type

tuple([np.ndarray, np.ndarray, np.ndarray, float])

mmaction.core.evaluation.confusion_matrix(y_pred, y_real, normalize=None)[source]

Compute confusion matrix.

Parameters
  • y_pred (list[int] | np.ndarray[int]) – Prediction labels.

  • y_real (list[int] | np.ndarray[int]) – Ground truth labels.

  • normalize (str | None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. Options are “true”, “pred”, “all”, None. Default: None.

Returns

Confusion matrix.

Return type

np.ndarray

mmaction.core.evaluation.get_weighted_score(score_list, coeff_list)[source]

Get weighted score with given scores and coefficients.

Given n predictions by different classifier: [score_1, score_2, …, score_n] (score_list) and their coefficients: [coeff_1, coeff_2, …, coeff_n] (coeff_list), return weighted score: weighted_score = score_1 * coeff_1 + score_2 * coeff_2 + … + score_n * coeff_n

Parameters
  • score_list (list[list[np.ndarray]]) – List of list of scores, with shape n(number of predictions) X num_samples X num_classes

  • coeff_list (list[float]) – List of coefficients, with shape n.

Returns

List of weighted scores.

Return type

list[np.ndarray]

mmaction.core.evaluation.mean_average_precision(scores, labels)[source]

Mean average precision for multi-label recognition.

Parameters
  • scores (list[np.ndarray]) – Prediction scores for each class.

  • labels (list[np.ndarray]) – Ground truth many-hot vector.

Returns

The mean average precision.

Return type

np.float

mmaction.core.evaluation.mean_class_accuracy(scores, labels)[source]

Calculate mean class accuracy.

Parameters
  • scores (list[np.ndarray]) – Prediction scores for each class.

  • labels (list[int]) – Ground truth labels.

Returns

Mean class accuracy.

Return type

np.ndarray

mmaction.core.evaluation.pairwise_temporal_iou(candidate_segments, target_segments)[source]

Compute intersection over union between segments.

Parameters
  • candidate_segments (np.ndarray) – 1-dim/2-dim array in format [init, end]/[m x 2:=[init, end]].

  • target_segments (np.ndarray) – 2-dim array in format [n x 2:=[init, end]].

Returns

1-dim array [n] /

2-dim array [n x m] with IoU ratio.

Return type

t_iou (np.ndarray)

mmaction.core.evaluation.softmax(x, dim=1)[source]

Compute softmax values for each sets of scores in x.

mmaction.core.evaluation.top_k_accuracy(scores, labels, topk=(1))[source]

Calculate top k accuracy score.

Parameters
  • scores (list[np.ndarray]) – Prediction scores for each class.

  • labels (list[int]) – Ground truth labels.

  • topk (tuple[int]) – K value for top_k_accuracy. Default: (1, ).

Returns

Top k accuracy score for each k.

Return type

list[float]

fp16

dist utils

mmaction.core.dist_utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[source]

Allreduce gradients.

Parameters
  • params (list[torch.Parameters]) – List of parameters of a model

  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.

  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.

mmaction.models

recognizers

localizers

common

backbones

heads

losses

mmaction.datasets

datasets

class mmaction.datasets.ActivityNetDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]

ActivityNet dataset for temporal action localization.

The dataset loads raw features and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a json file with multiple objects, and each object has a key of the name of a video, and value of total frames of the video, total seconds of the video, annotations of a video, feature frames (frames covered by features) of the video, fps and rfps. Example of a annotation file:

{
    "v_--1DO2V4K74":  {
        "duration_second": 211.53,
        "duration_frame": 6337,
        "annotations": [
            {
                "segment": [
                    30.025882995319815,
                    205.2318595943838
                ],
                "label": "Rock climbing"
            }
        ],
        "feature_frame": 6336,
        "fps": 30.0,
        "rfps": 29.9579255898
    },
    "v_--6bJUbfpnQ": {
        "duration_second": 26.75,
        "duration_frame": 647,
        "annotations": [
            {
                "segment": [
                    2.578755070202808,
                    24.914101404056165
                ],
                "label": "Drinking beer"
            }
        ],
        "feature_frame": 624,
        "fps": 24.0,
        "rfps": 24.1869158879
    },
    ...
}
Parameters
  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • data_prefix (str) – Path to a directory where videos are held. Default: None.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

dump_results(results, out, output_format, version='VERSION 1.3')[source]

Dump data to json/csv files.

evaluate(results, metrics='AR@AN', max_avg_proposals=100, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]), logger=None)[source]

Evaluation in feature dataset.

Parameters
  • results (list[dict]) – Output results.

  • metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘AR@AN’.

  • max_avg_proposals (int) – Max number of proposals to evaluate. Defaults: 100.

  • temporal_iou_thresholds (list) – Temporal IoU threshold for positive samples. Defaults: np.linspace(0.5, 0.95, 10).

  • logger (logging.Logger | None) – Training logger. Defaults: None.

Returns

Evaluation results for evaluation metrics.

Return type

dict

load_annotations()[source]

Load the annotation according to ann_file into video_infos.

prepare_test_frames(idx)[source]

Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]

Prepare the frames for training given the index.

proposals2json(results, show_progress=False)[source]

Convert all proposals to a final dict(json) format.

Parameters
  • results (list[dict]) – All proposals.

  • show_progress (bool) – Whether to show the progress bar. Defaults: False.

Returns

The final result dict. E.g.

dict(video-1=[dict(segment=[1.1,2.0]. score=0.9),
              dict(segment=[50.1, 129.3], score=0.6)])

Return type

dict

class mmaction.datasets.BaseDataset(ann_file, pipeline, data_prefix=None, test_mode=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]

Base class for datasets.

All datasets to process video should subclass it. All subclasses should overwrite:

  • Methods:load_annotations, supporting to load information from an

annotation file.

  • Methods:prepare_train_frames, providing train data.

  • Methods:prepare_test_frames, providing test data.

Parameters
  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • data_prefix (str) – Path to a directory where videos are held. Default: None.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

  • multi_class (bool) – Determines whether the dataset is a multi-class dataset. Default: False.

  • num_classes (int) – Number of classes of the dataset, used in multi-class datasets. Default: None.

  • start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.

  • modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.

dump_results(results, out)[source]

Dump data to json/yaml/pickle strings or files.

abstract evaluate(results, metrics, logger)[source]

Evaluation for the dataset.

Parameters
  • results (list) – Output results.

  • metrics (str | sequence[str]) – Metrics to be performed.

  • logger (logging.Logger | None) – Logger for recording.

Returns

Evaluation results dict.

Return type

dict

abstract load_annotations()[source]

Load the annotation according to ann_file into video_infos.

load_json_annotations()[source]

Load json annotation file to get video information.

prepare_test_frames(idx)[source]

Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]

Prepare the frames for training given the index.

class mmaction.datasets.RawframeDataset(ann_file, pipeline, data_prefix=None, test_mode=False, filename_tmpl='img_{:05}.jpg', with_offset=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]

Rawframe dataset for action recognition.

The dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Example of a annotation file:

some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3

Example of a multi-class annotation file:

some/directory-1 163 1 3 5
some/directory-2 122 1 2
some/directory-3 258 2
some/directory-4 234 2 4 6 8
some/directory-5 295 3
some/directory-6 121 3

Example of a with_offset annotation file (clips from long videos), each line indicates the directory to frames of a video, the index of the start frame, total frames of the video clip and the label of a video clip, which are split with a whitespace.

some/directory-1 12 163 3
some/directory-2 213 122 4
some/directory-3 100 258 5
some/directory-4 98 234 2
some/directory-5 0 295 3
some/directory-6 50 121 3
Parameters
  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • data_prefix (str) – Path to a directory where videos are held. Default: None.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

  • filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.

  • with_offset (bool) – Determines whether the offset information is in ann_file. Default: False.

  • multi_class (bool) – Determines whether it is a multi-class recognition dataset. Default: False.

  • num_classes (int) – Number of classes in the dataset. Default: None.

  • modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.

evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]

Evaluation in rawframe dataset.

Parameters
  • results (list) – Output results.

  • metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.

  • logger (logging.Logger | None) – Training logger. Defaults: None.

  • topk (int | tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).

  • logger – Logger for recording. Default: None.

Returns

Evaluation results dict.

Return type

dict

load_annotations()[source]

Load annotation file to get video information.

prepare_test_frames(idx)[source]

Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]

Prepare the frames for training given the index.

class mmaction.datasets.RepeatDataset(dataset, times)[source]

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

Parameters
  • dataset (Dataset) – The dataset to be repeated.

  • times (int) – Repeat times.

class mmaction.datasets.SSNDataset(ann_file, pipeline, train_cfg, test_cfg, data_prefix, test_mode=False, filename_tmpl='img_{:05d}.jpg', start_index=1, modality='RGB', video_centric=True, reg_normalize_constants=None, body_segments=5, aug_segments=(2, 2), aug_ratio=(0.5, 0.5), clip_len=1, frame_interval=1, filter_gt=True, use_regression=True, verbose=False)[source]

Proposal frame dataset for Structured Segment Networks.

Based on proposal information, the dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines and each video’s information takes up several lines. This file can be a normalized file with percent or standard file with specific frame indexes. If the file is a normalized file, it will be converted into a standard file first.

Template information of a video in a standard file: .. code-block:: txt

# index video_id num_frames fps num_gts label, start_frame, end_frame label, start_frame, end_frame … num_proposals label, best_iou, overlap_self, start_frame, end_frame label, best_iou, overlap_self, start_frame, end_frame …

Example of a standard annotation file: .. code-block:: txt

# 0 video_validation_0000202 5666 1 3 8 130 185 8 832 1136 8 1303 1381 5 8 0.0620 0.0620 790 5671 8 0.1656 0.1656 790 2619 8 0.0833 0.0833 3945 5671 8 0.0960 0.0960 4173 5671 8 0.0614 0.0614 3327 5671

Parameters
  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • train_cfg (dict) – Config for training.

  • test_cfg (dict) – Config for testing.

  • data_prefix (str) – Path to a directory where videos are held.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

  • filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.

  • start_index (int) – Specify a start index for frames in consideration of different filename format. Default: 1.

  • modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.

  • video_centric (bool) – Whether to sample proposals just from this video or sample proposals randomly from the entire dataset. Default: True.

  • reg_normalize_constants (list) – Regression target normalized constants, including mean and standard deviation of location and duration.

  • body_segments (int) – Number of segments in course period. Default: 5.

  • aug_segments (list[int]) – Number of segments in starting and ending period. Default: (2, 2).

  • aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal. Defualt: (0.5, 0.5).

  • clip_len (int) – Frames of each sampled output clip. Default: 1.

  • frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.

  • filter_gt (bool) – Whether to filter videos with no annotation during training. Default: True.

  • use_regression (bool) – Whether to perform regression. Default: True.

  • verbose (bool) – Whether to print full information or not. Default: False.

construct_proposal_pools()[source]

Construct positve proposal pool, incomplete proposal pool and background proposal pool of the entire dataset.

evaluate(results, metrics='mAP', eval_dataset='thumos14', **kwargs)[source]

Evaluation in SSN proposal dataset.

Parameters
  • results (list[dict]) – Output results.

  • metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘mAP’.

  • eval_dataset (str) – Dataset to be evaluated.

Returns

Evaluation results for evaluation metrics.

Return type

dict

get_all_gts()[source]

Fetch groundtruth instances of the entire dataset.

get_negatives(proposals, incomplete_iou_threshold, background_iou_threshold, background_coverage_threshold=0.01, incomplete_overlap_threshold=0.7)[source]

Get negative proposals, including incomplete proposals and background proposals.

Parameters
  • proposals (list) – List of proposal instances(SSNInstance).

  • incomplete_iou_threshold (float) – Maximum threshold of overlap of incomplete proposals and groundtruths.

  • background_iou_threshold (float) – Maximum threshold of overlap of background proposals and groundtruths.

  • background_coverage_threshold (float) – Minimum coverage of background proposals in video duration. Default: 0.01.

  • incomplete_overlap_threshold (float) – Minimum percent of incomplete proposals’ own span contained in a groundtruth instance. Default: 0.7.

Returns

(incompletes, backgrounds), incompletes

and backgrounds are lists comprised of incomplete proposal instances and background proposal instances.

Return type

list[SSNInstance]

get_positives(gts, proposals, positive_threshold, with_gt=True)[source]

Get positive/foreground proposals.

Parameters
  • gts (list) – List of groundtruth instances(SSNInstance).

  • proposals (list) – List of proposal instances(SSNInstance).

  • positive_threshold (float) – Minimum threshold of overlap of positive/foreground proposals and groundtruths.

  • with_gt (bool) – Whether to include groundtruth instances in positive proposals. Default: True.

Returns

(positives), positives is a list

comprised of positive proposal instances.

Return type

list[SSNInstance]

load_annotations()[source]

Load annotation file to get video information.

prepare_test_frames(idx)[source]

Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]

Prepare the frames for training given the index.

results_to_detections(results, top_k=2000, softmax_before_filter=True, cls_top_k=2, **kwargs)[source]

Convert prediction results into detections.

Parameters
  • results (list) – Prediction results.

  • top_k (int) – Number of top results. Default: 2000.

  • softmax_before_filter (bool) – Whether to perform softmax operations before filtering results. Default: True.

  • cls_top_k (int) – Number of top results for each class. Default: 2.

Returns

Detection results.

Return type

list

class mmaction.datasets.VideoDataset(ann_file, pipeline, start_index=0, **kwargs)[source]

Video dataset for action recognition.

The dataset loads raw videos and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates a sample video with the filepath and label, which are split with a whitespace. Example of a annotation file:

some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3
Parameters
  • ann_file (str) – Path to the annotation file.

  • pipeline (list[dict | callable]) – A sequence of data transforms.

  • start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 0.

  • **kwargs – Keyword arguments for BaseDataset.

evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]

Evaluation in rawframe dataset.

Parameters
  • results (list) – Output results.

  • metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.

  • logger (logging.Logger | None) – Training logger. Defaults: None.

  • topk (tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).

  • logger – Logger for recording. Default: None.

Returns

Evaluation results dict.

Return type

dict

load_annotations()[source]

Load annotation file to get video information.

mmaction.datasets.build_dataloader(dataset, videos_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[source]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters
  • dataset (Dataset) – A PyTorch dataset.

  • videos_per_gpu (int) – Number of videos on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • seed (int | None) – Seed to be used. Default: None.

  • drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False

  • pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True

  • kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.

Returns

A PyTorch dataloader.

Return type

DataLoader

mmaction.datasets.build_dataset(cfg, default_args=None)[source]

Build a dataset from config dict.

Parameters
  • cfg (dict) – Config dict. It should at least contain the key “type”.

  • default_args (dict, optional) – Default initialization arguments. Default: None.

Returns

The constructed dataset.

Return type

Dataset

pipelines

class mmaction.datasets.pipelines.CenterCrop(crop_size, lazy=False)[source]

Crop the center area from images.

Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “lazy” and “img_shape”. Required keys in “lazy” is “crop_bbox”, added or modified key is “crop_bbox”.

Parameters
  • crop_size (int | tuple[int]) – (w, h) of crop size.

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Collect(keys, meta_keys=('filename', 'label', 'original_shape', 'img_shape', 'pad_shape', 'flip_direction', 'img_norm_cfg'), meta_name='img_meta')[source]

Collect data from the loader relevant to the specific task.

This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_meta’, the results will be a dict with keys ‘imgs’ and ‘img_meta’, where ‘img_meta’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.

Parameters
  • keys (Sequence[str]) – Required keys to be collected.

  • meta_name (str) – The name of the key that contains meta infomation. This key is always populated. Default: “img_meta”.

  • meta_keys (Sequence[str]) –

    Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys. By default this includes:

    • ”filename”: path to the image file

    • ”label”: label of the image file

    • ”original_shape”: original shape of the image as a tuple

    (h, w, c)

    • ”img_shape”: shape of the image input to the network as a tuple

    (h, w, c). Note that images may be zero padded on the bottom/right, if the batch tensor is larger than this shape.

    • ”pad_shape”: image shape after padding

    • ”flip_direction”: a str in (“horiziontal”, “vertival”) to

    indicate if the image is fliped horizontally or vertically.

    • ”img_norm_cfg”: a dict of normalization information:

      • mean - per channel mean subtraction

      • std - per channel std divisor

      • to_rgb - bool indicating if bgr was converted to rgb

class mmaction.datasets.pipelines.Compose(transforms)[source]

Compose a data pipeline with a sequence of transforms.

Parameters

transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmaction.datasets.pipelines.DecordDecode(**kwargs)[source]

Using decord to decode the video.

Decord: https://github.com/dmlc/decord

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs” and “original_shape”.

class mmaction.datasets.pipelines.DecordInit(io_backend='disk', num_threads=1, **kwargs)[source]

Using decord to initialize the video_reader.

Decord: https://github.com/dmlc/decord

Required keys are “filename”, added or modified keys are “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.DenseSampleFrames(clip_len, frame_interval=1, num_clips=1, sample_range=64, num_sample_positions=10, temporal_jitter=False, out_of_bound_opt='loop', test_mode=False)[source]

Select frames from the video by dense sample strategy.

Required keys are “filename”, added or modified keys are “total_frames”, “frame_inds”, “frame_interval” and “num_clips”.

Parameters
  • clip_len (int) – Frames of each sampled output clip.

  • frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.

  • num_clips (int) – Number of clips to be sampled. Default: 1.

  • sample_range (int) – Total sample range for dense sample. Default: 64.

  • num_sample_positions (int) – Number of sample start positions, Which is only used in test mode. Default: 10.

  • temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmaction.datasets.pipelines.Flip(flip_ratio=0.5, direction='horizontal', lazy=False)[source]

Flip the input images with a probability.

Reverse the order of elements in the given imgs with a specific direction. The shape of the imgs is preserved, but the elements are reordered. Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “lazy” and “flip_direction”. Required keys in “lazy” is None, added or modified key are “flip” and “flip_direction”.

Parameters
  • flip_ratio (float) – Probability of implementing flip. Default: 0.5.

  • direction (str) – Flip imgs horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.FormatShape(input_format)[source]

Format final imgs shape to the given input_format.

Required keys are “imgs”, “num_clips” and “clip_len”, added or modified keys are “imgs” and “input_shape”.

Parameters

input_format (str) – Define the final imgs format.

class mmaction.datasets.pipelines.FrameSelector(*args, **kwargs)[source]

Deprecated class for RawFrameDecode.

class mmaction.datasets.pipelines.Fuse[source]

Fuse lazy operations.

Fusion order:

crop -> resize -> flip

Required keys are “imgs”, “img_shape” and “lazy”, added or modified keys are “imgs”, “lazy”. Required keys in “lazy” are “crop_bbox”, “interpolation”, “flip_direction”.

class mmaction.datasets.pipelines.GenerateLocalizationLabels[source]

Load video label for localizer with given video_name list.

Required keys are “duration_frame”, “duration_second”, “feature_frame”, “annotations”, added or modified keys are “gt_bbox”.

class mmaction.datasets.pipelines.ImageToTensor(keys)[source]

Convert image type to torch.Tensor type.

Parameters

keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.LoadLocalizationFeature(raw_feature_ext='.csv')[source]

Load Video features for localizer with given video_name list.

Required keys are “video_name” and “data_prefix”, added or modified keys are “raw_feature”.

Parameters

raw_feature_ext (str) – Raw feature file extension. Default: ‘.csv’.

class mmaction.datasets.pipelines.LoadProposals(top_k, pgm_proposals_dir, pgm_features_dir, proposal_ext='.csv', feature_ext='.npy')[source]

Loading proposals with given proposal results.

Required keys are “video_name” added or modified keys are ‘bsp_feature’, ‘tmin’, ‘tmax’, ‘tmin_score’, ‘tmax_score’ and ‘reference_temporal_iou’.

Parameters
  • top_k (int) – The top k proposals to be loaded.

  • pgm_proposals_dir (str) – Directory to load proposals.

  • pgm_features_dir (str) – Directory to load proposal features.

  • proposal_ext (str) – Proposal file extension. Default: ‘.csv’.

  • feature_ext (str) – Feature file extension. Default: ‘.npy’.

class mmaction.datasets.pipelines.MultiGroupCrop(crop_size, groups)[source]

Randomly crop the images into several groups.

Crop the random region with the same given crop_size and bounding box into several groups. Required keys are “imgs”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters
  • crop_size (int | tuple[int]) – (w, h) of crop size.

  • groups (int) – Number of groups.

class mmaction.datasets.pipelines.MultiScaleCrop(input_size, scales=(1), max_wh_scale_gap=1, random_crop=False, num_fixed_crops=5, lazy=False)[source]

Crop images with a list of randomly selected scales.

Randomly select the w and h scales from a list of scales. Scale of 1 means the base size, which is the minimal of image weight and height. The scale level of w and h is controlled to be smaller than a certain value to prevent too large or small aspect ratio. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “img_shape”, “lazy” and “scales”. Required keys in “lazy” are “crop_bbox”, added or modified key is “crop_bbox”.

Parameters
  • input_size (int | tuple[int]) – (w, h) of network input.

  • scales (tuple[float]) – Weight and height scales to be selected.

  • max_wh_scale_gap (int) – Maximum gap of w and h scale levels. Default: 1.

  • random_crop (bool) – If set to True, the cropping bbox will be randomly sampled, otherwise it will be sampler from fixed regions. Default: False.

  • num_fixed_crops (int) – If set to 5, the cropping bbox will keep 5 basic fixed regions: “upper left”, “upper right”, “lower left”, “lower right”, “center”.If set to 13, the cropping bbox will append another 8 fix regions: “center left”, “center right”, “lower center”, “upper center”, “upper left quarter”, “upper right quarter”, “lower left quarter”, “lower right quarter”. Default: 5.

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Normalize(mean, std, to_bgr=False, adjust_magnitude=False)[source]

Normalize images with the given mean and std value.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs” and “img_norm_cfg”. If modality is ‘Flow’, additional keys “scale_factor” is required

Parameters
  • mean (Sequence[float]) – Mean values of different channels.

  • std (Sequence[float]) – Std values of different channels.

  • to_bgr (bool) – Whether to convert channels from RGB to BGR. Default: False.

  • adjust_magnitude (bool) – Indicate whether to adjust the flow magnitude on ‘scale_factor’ when modality is ‘Flow’. Default: False.

class mmaction.datasets.pipelines.OpenCVDecode[source]

Using OpenCV to decode the video.

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

class mmaction.datasets.pipelines.OpenCVInit(io_backend='disk', **kwargs)[source]

Using OpenCV to initalize the video_reader.

Required keys are “filename”, added or modified keys are “new_path”, “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.PyAVDecode(multi_thread=False)[source]

Using pyav to decode the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “video_reader” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

Parameters

multi_thread (bool) – If set to True, it will apply multi thread processing. Default: False.

class mmaction.datasets.pipelines.PyAVInit(io_backend='disk', **kwargs)[source]

Using pyav to initialize the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “filename”, added or modified keys are “video_reader”, and “total_frames”.

Parameters
  • io_backend (str) – io backend where frames are store. Default: ‘disk’.

  • kwargs (dict) – Args for file client.

class mmaction.datasets.pipelines.RandomCrop(size, lazy=False)[source]

Vanilla square random crop that specifics the output size.

Required keys in results are “imgs” and “img_shape”, added or modified keys are “imgs”, “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

Parameters
  • size (int) – The output size of the images.

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.RandomResizedCrop(area_range=(0.08, 1.0), aspect_ratio_range=(0.75, 1.3333333333333333), lazy=False)[source]

Random crop that specifics the area and height-weight ratio range.

Required keys in results are “imgs”, “img_shape”, “crop_bbox” and “lazy”, added or modified keys are “imgs”, “crop_bbox” and “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

Parameters
  • area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).

  • aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3).

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

static get_crop_bbox(img_shape, area_range, aspect_ratio_range, max_attempts=10)[source]

Get a crop bbox given the area range and aspect ratio range.

Parameters
  • img_shape (Tuple[int]) – Image shape

  • area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).

  • aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3). max_attempts (int): The maximum of attempts. Default: 10.

  • max_attempts (int) – Max attempts times to generate random candidate bounding box. If it doesn’t qualified one, the center bounding box will be used.

Returns

(list[int]) A random crop bbox within the area range and aspect ratio range.

class mmaction.datasets.pipelines.RawFrameDecode(io_backend='disk', decoding_backend='cv2', **kwargs)[source]

Load and decode frames with given indices.

Required keys are “frame_dir”, “filename_tmpl” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

Parameters
  • io_backend (str) – IO backend where frames are stored. Default: ‘disk’.

  • decoding_backend (str) – Backend used for image decoding. Default: ‘cv2’.

  • kwargs (dict, optional) – Arguments for FileClient.

class mmaction.datasets.pipelines.Resize(scale, keep_ratio=True, interpolation='bilinear', lazy=False)[source]

Resize images to a specific size.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “lazy”, “resize_size”. Required keys in “lazy” is None, added or modified key is “interpolation”.

Parameters
  • scale (float | Tuple[int]) – If keep_ratio is True, it serves as scaling factor or maximum size: If it is a float number, the image will be rescaled by this factor, else if it is a tuple of 2 integers, the image will be rescaled as large as possible within the scale. Otherwise, it serves as (w, h) of output size.

  • keep_ratio (bool) – If set to True, Images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: True.

  • interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear”. Default: “bilinear”.

  • lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.SampleFrames(clip_len, frame_interval=1, num_clips=1, temporal_jitter=False, twice_sample=False, out_of_bound_opt='loop', test_mode=False, start_index=None)[source]

Sample frames from the video.

Required keys are “filename”, “total_frames”, “start_index” , added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

Parameters
  • clip_len (int) – Frames of each sampled output clip.

  • frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.

  • num_clips (int) – Number of clips to be sampled. Default: 1.

  • temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.

  • twice_sample (bool) – Whether to use twice sample when testing. If set to True, it will sample frames with and without fixed shift, which is commonly used for testing in TSM model. Default: False.

  • out_of_bound_opt (str) – The way to deal with out of bounds frame indexes. Available options are ‘loop’, ‘repeat_last’. Default: ‘loop’.

  • test_mode (bool) – Store True when building test or validation dataset. Default: False.

  • start_index (None) – This argument is deprecated and moved to dataset class (BaseDataset, VideoDatset, RawframeDataset, etc), see this: https://github.com/open-mmlab/mmaction2/pull/89.

class mmaction.datasets.pipelines.SampleProposalFrames(clip_len, body_segments, aug_segments, aug_ratio, frame_interval=1, test_interval=6, temporal_jitter=False, mode='train')[source]

Sample frames from proposals in the video.

Required keys are “total_frames” and “out_proposals”, added or modified keys are “frame_inds”, “frame_interval”, “num_clips”, ‘clip_len’ and ‘num_proposals’.

Parameters
  • clip_len (int) – Frames of each sampled output clip.

  • body_segments (int) – Number of segments in course period.

  • aug_segments (list[int]) – Number of segments in starting and ending period.

  • aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal.

  • frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.

  • test_interval (int) – Temporal interval of adjacent sampled frames in test mode. Default: 6.

  • temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.

  • mode (str) – Choose ‘train’, ‘val’ or ‘test’ mode. Default: ‘train’.

class mmaction.datasets.pipelines.TenCrop(crop_size)[source]

Crop the images into 10 crops (corner + center + flip).

Crop the four corners and the center part of the image with the same given crop_size, and flip it horizontally. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters

crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ThreeCrop(crop_size)[source]

Crop images into three crops.

Crop the images equally into three crops with equal intervals along the shorter side. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters

crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ToDataContainer(fields)[source]

Convert the data to DataContainer.

Parameters

fields (Sequence[dict]) – Required fields to be converted with keys and attributes. E.g. fields=(dict(key=’gt_bbox’, stack=False),).

class mmaction.datasets.pipelines.ToTensor(keys)[source]

Convert some values in results dict to torch.Tensor type in data loader pipeline.

Parameters

keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.Transpose(keys, order)[source]

Transpose image channels to a given order.

Parameters
  • keys (Sequence[str]) – Required keys to be converted.

  • order (Sequence[int]) – Image channel order.

class mmaction.datasets.pipelines.UntrimmedSampleFrames(clip_len=1, frame_interval=16, start_index=1)[source]

Sample frames from the untrimmed video.

Required keys are “filename”, “total_frames”, added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

Parameters
  • clip_len (int) – The length of sampled clips. Default: 1.

  • frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 16.

  • start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.

samplers

class mmaction.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True)[source]

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

mmaction.utils

mmaction.utils.get_random_string(length=15)[source]

Get random string with letters and digits.

Parameters

length (int) – Length of random string. Default: 15.

mmaction.utils.get_root_logger(log_file=None, log_level=20)[source]

Use get_logger method in mmcv to get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmaction”.

Parameters
  • log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.

  • log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

Returns

The root logger.

Return type

logging.Logger

mmaction.utils.get_shm_dir()[source]

Get shm dir for temporary usage.

mmaction.utils.get_thread_id()[source]

Get current thread id.

mmaction.localization

mmaction.localization.eval_ap(detections, gt_by_cls, iou_range)[source]

Evaluate average precisions.

Parameters
  • detections (dict) – Results of detections.

  • gt_by_cls (dict) – Information of groudtruth.

  • iou_range (list) – Ranges of iou.

Returns

Average precision values of classes at ious.

Return type

list

mmaction.localization.generate_bsp_feature(video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=1000, bsp_boundary_ratio=0.2, num_sample_start=8, num_sample_end=8, num_sample_action=16, num_sample_interp=3, tem_results_ext='.csv', pgm_proposal_ext='.csv', result_dict=None)[source]

Generate Boundary-Sensitive Proposal Feature with given proposals.

Parameters
  • video_list (list[int]) – List of video indexs to generate bsp_feature.

  • video_infos (list[dict]) – List of video_info dict that contains ‘video_name’.

  • tem_results_dir (str) – Directory to load temporal evaluation results.

  • pgm_proposals_dir (str) – Directory to load proposals.

  • top_k (int) – Number of proposals to be considered. Default: 1000

  • bsp_boundary_ratio (float) – Ratio for proposal boundary (start/end). Default: 0.2.

  • num_sample_start (int) – Num of samples for actionness in start region. Default: 8.

  • num_sample_end (int) – Num of samples for actionness in end region. Default: 8.

  • num_sample_action (int) – Num of samples for actionness in center region. Default: 16.

  • num_sample_interp (int) – Num of samples for interpolation for each sample point. Default: 3.

  • tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.

  • pgm_proposal_ext (str) – File extension for proposals. Default: ‘.csv’.

  • result_dict (dict) – The dict to save the results. Default: None.

Returns

A dict contains video_name as keys and bsp_feature as value. If result_dict is not None, save the results to it.

Return type

bsp_feature_dict (dict)

mmaction.localization.generate_candidate_proposals(video_list, video_infos, tem_results_dir, temporal_scale, peak_threshold, tem_results_ext='.csv', result_dict=None)[source]

Generate Candidate Proposals with given temporal evalutation results. Each proposal file will contain: ‘tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa’.

Parameters
  • video_list (list[int]) – List of video indexs to generate proposals.

  • video_infos (list[dict]) – List of video_info dict that contains ‘video_name’, ‘duration_frame’, ‘duration_second’, ‘feature_frame’, and ‘annotations’.

  • tem_results_dir (str) – Directory to load temporal evaluation results.

  • temporal_scale (int) – The number (scale) on temporal axis.

  • peak_threshold (float) – The threshold for proposal generation.

  • tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.

  • result_dict (dict) – The dict to save the results. Default: None.

Returns

A dict contains video_name as keys and proposal list as value. If result_dict is not None, save the results to it.

Return type

dict

mmaction.localization.load_localize_proposal_file(filename)[source]

Load the proposal file and split it into many parts which contain one video’s information separately.

Parameters

filename (str) – Path to the proposal file.

Returns

List of all videos’ information.

Return type

list

mmaction.localization.perform_regression(detections)[source]

Perform regression on detection results.

Parameters

detections (list) – Detection results before regression.

Returns

Detection results after regression.

Return type

list

mmaction.localization.soft_nms(proposals, alpha, low_threshold, high_threshold, top_k)[source]

Soft NMS for temporal proposals.

Parameters
  • proposals (np.ndarray) – Proposals generated by network.

  • alpha (float) – Alpha value of Gaussian decaying function.

  • low_threshold (float) – Low threshold for soft nms.

  • high_threshold (float) – High threshold for soft nms.

  • top_k (int) – Top k values to be considered.

Returns

The updated proposals.

Return type

new_proposals (np.ndarray)

mmaction.localization.temporal_iop(proposal_min, proposal_max, gt_min, gt_max)[source]

Compute IoP score between a groundtruth bbox and the proposals.

Compute the IoP which is defined as the overlap ratio with groundtruth proportional to the duration of this proposal.

Parameters
  • proposal_min (list[float]) – List of temporal anchor min.

  • proposal_max (list[float]) – List of temporal anchor max.

  • gt_min (float) – Groundtruth temporal box min.

  • gt_max (float) – Groundtruth temporal box max.

Returns

List of intersection over anchor scores.

Return type

scores (list[float])

mmaction.localization.temporal_iou(proposal_min, proposal_max, gt_min, gt_max)[source]

Compute IoU score between a groundtruth bbox and the proposals.

Parameters
  • proposal_min (list[float]) – List of temporal anchor min.

  • proposal_max (list[float]) – List of temporal anchor max.

  • gt_min (float) – Groundtruth temporal box min.

  • gt_max (float) – Groundtruth temporal box max.

Returns

List of iou scores.

Return type

jaccard (list[float])

mmaction.localization.temporal_nms(detections, threshold)[source]

Parse the video’s information.

Parameters
  • detections (list) – Detection results before NMS.

  • threshold (float) – Threshold of NMS.

Returns

Detection results after NMS.

Return type

list