API Reference¶

mmaction.core¶

optimizer¶

class mmaction.core.optimizer.CopyOfSGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]¶

A clone of torch.optim.SGD.

A customized optimizer could be defined like CopyOfSGD. You may derive from built-in optimizers in torch.optim, or directly implement a new optimizer.

class mmaction.core.optimizer.TSMOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]¶

Optimizer constructor in TSM model.

This constructor builds optimizer in different ways from the default one.

Parameters of the first conv layer have default lr and weight decay.
Parameters of BN layers have default lr and zero weight decay.
If the field “fc_lr5” in paramwise_cfg is set to True, the parameters of the last fc layer in cls_head have 5x lr multiplier and 10x weight decay multiplier.
Weights of other layers have default lr and weight decay, and biases have a 2x lr multiplier and zero weight decay.

add_params(params, model)[source]¶

Add parameters and their corresponding lr and wd to the params.

Parameters

params (list) – The list to be modified, containing all parameter groups and their corresponding lr and wd configurations.
model (nn.Module) – The model to be trained with the optimizer.

evaluation¶

class mmaction.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]¶

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

Parameters

dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.
key_indicator (str | None) – Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g., top1_acc, top5_acc, mean_class_accuracy, mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). Default: top1_acc.
rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.
eval_kwargs (dict, optional) – Arguments for evaluation.

after_train_epoch(runner)[source]¶: Called after each training epoch to evaluate the model.

class mmaction.core.evaluation.EvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]¶

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters

dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.
key_indicator (str | None) –
Key indicator to measure the best checkpoint during evaluation when save_best is set to True. Options are the evaluation metrics to the test dataset. e.g.,

top1_acc, top5_acc, mean_class_accuracy,

mean_average_precision for action recognition dataset (RawframeDataset and VideoDataset). AR@AN, auc for action localization dataset (ActivityNetDataset). Default: top1_acc.
rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.
eval_kwargs (dict, optional) – Arguments for evaluation.

after_train_epoch(runner)[source]¶: Called after every training epoch to evaluate the results.

evaluate(runner, results)[source]¶

Evaluate the results.

Parameters

runner (mmcv.Runner) – The underlined training runner.
results (list) – Output results.

mmaction.core.evaluation.average_precision_at_temporal_iou(ground_truth, prediction, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]¶

Compute average precision (in detection task) between ground truth and predicted data frames. If multiple predictions match the same predicted segment, only the one with highest score is matched as true positive. This code is greatly inspired by Pascal VOC devkit.

Parameters

ground_truth (dict) – Dict containing the ground truth instances. Key: ‘video_id’ Value (np.ndarry): 1D array of ‘t-start’ and ‘t-end’.
proposals (np.ndarray) – 2D array containing the information of proposal instances, including ‘video_id’, ‘class_id’, ‘t-start’, ‘t-end’ and ‘score’.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

Returns

1D array of average precision score.

Return type

np.ndarray

mmaction.core.evaluation.average_recall_at_avg_proposals(ground_truth, proposals, total_num_proposals, max_avg_proposals=None, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]¶

Computes the average recall given an average number (percentile) of proposals per video.

Parameters

ground_truth (dict) – Dict containing the ground truth instances.
proposals (dict) – Dict containing the proposal instances.
total_num_proposals (int) – Total number of proposals in the proposal dict.
max_avg_proposals (int | None) – Max number of proposals for one video. Default: None.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).

Returns

(recall, average_recall, proposals_per_video, auc) In recall, recall[i,j] is recall at i-th temporal_iou threshold at the j-th average number (percentile) of average number of proposals per video. The average_recall is recall averaged over a list of temporal_iou threshold (1D array). This is equivalent to recall.mean(axis=0). The proposals_per_video is the average number of proposals per video. The auc is the area under AR@AN curve.

Return type

tuple([np.ndarray, np.ndarray, np.ndarray, float])

mmaction.core.evaluation.confusion_matrix(y_pred, y_real, normalize=None)[source]¶

Compute confusion matrix.

Parameters

y_pred (list[int] | np.ndarray[int]) – Prediction labels.
y_real (list[int] | np.ndarray[int]) – Ground truth labels.
normalize (str | None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. Options are “true”, “pred”, “all”, None. Default: None.

Returns

Confusion matrix.

Return type

np.ndarray

mmaction.core.evaluation.get_weighted_score(score_list, coeff_list)[source]¶

Get weighted score with given scores and coefficients.

Given n predictions by different classifier: [score_1, score_2, …, score_n] (score_list) and their coefficients: [coeff_1, coeff_2, …, coeff_n] (coeff_list), return weighted score: weighted_score = score_1 * coeff_1 + score_2 * coeff_2 + … + score_n * coeff_n

Parameters

score_list (list[list[np.ndarray]]) – List of list of scores, with shape n(number of predictions) X num_samples X num_classes
coeff_list (list[float]) – List of coefficients, with shape n.

Returns

List of weighted scores.

Return type

list[np.ndarray]

mmaction.core.evaluation.mean_average_precision(scores, labels)[source]¶

Mean average precision for multi-label recognition.

Parameters

scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[np.ndarray]) – Ground truth many-hot vector.

Returns

The mean average precision.

Return type

np.float

mmaction.core.evaluation.mean_class_accuracy(scores, labels)[source]¶

Calculate mean class accuracy.

Parameters

scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.

Returns

Mean class accuracy.

Return type

np.ndarray

mmaction.core.evaluation.pairwise_temporal_iou(candidate_segments, target_segments)[source]¶

Compute intersection over union between segments.

Parameters

candidate_segments (np.ndarray) – 1-dim/2-dim array in format [init, end]/[m x 2:=[init, end]].
target_segments (np.ndarray) – 2-dim array in format [n x 2:=[init, end]].

Returns

1-dim array [n] /: 2-dim array [n x m] with IoU ratio.

Return type

t_iou (np.ndarray)

mmaction.core.evaluation.softmax(x, dim=1)[source]¶: Compute softmax values for each sets of scores in x.

mmaction.core.evaluation.top_k_accuracy(scores, labels, topk=(1))[source]¶

Calculate top k accuracy score.

Parameters

scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.
topk (tuple[int]) – K value for top_k_accuracy. Default: (1, ).

Returns

Top k accuracy score for each k.

Return type

list[float]

fp16¶

dist utils¶

mmaction.core.dist_utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[source]¶

Allreduce gradients.

Parameters

params (list[torch.Parameters]) – List of parameters of a model
coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.

mmaction.models¶

recognizers¶

localizers¶

common¶

backbones¶

heads¶

losses¶

mmaction.datasets¶

datasets¶

class mmaction.datasets.ActivityNetDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶

ActivityNet dataset for temporal action localization.

The dataset loads raw features and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a json file with multiple objects, and each object has a key of the name of a video, and value of total frames of the video, total seconds of the video, annotations of a video, feature frames (frames covered by features) of the video, fps and rfps. Example of a annotation file:

{
    "v_--1DO2V4K74":  {
        "duration_second": 211.53,
        "duration_frame": 6337,
        "annotations": [
            {
                "segment": [
                    30.025882995319815,
                    205.2318595943838
                ],
                "label": "Rock climbing"
            }
        ],
        "feature_frame": 6336,
        "fps": 30.0,
        "rfps": 29.9579255898
    },
    "v_--6bJUbfpnQ": {
        "duration_second": 26.75,
        "duration_frame": 647,
        "annotations": [
            {
                "segment": [
                    2.578755070202808,
                    24.914101404056165
                ],
                "label": "Drinking beer"
            }
        ],
        "feature_frame": 624,
        "fps": 24.0,
        "rfps": 24.1869158879
    },
    ...
}

Parameters

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

dump_results(results, out, output_format, version='VERSION 1.3')[source]¶: Dump data to json/csv files.

evaluate(results, metrics='AR@AN', max_avg_proposals=100, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]), logger=None)[source]¶

Evaluation in feature dataset.

Parameters

results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘AR@AN’.
max_avg_proposals (int) – Max number of proposals to evaluate. Defaults: 100.
temporal_iou_thresholds (list) – Temporal IoU threshold for positive samples. Defaults: np.linspace(0.5, 0.95, 10).
logger (logging.Logger | None) – Training logger. Defaults: None.

Returns

Evaluation results for evaluation metrics.

Return type

dict

load_annotations()[source]¶: Load the annotation according to ann_file into video_infos.

prepare_test_frames(idx)[source]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]¶: Prepare the frames for training given the index.

proposals2json(results, show_progress=False)[source]¶

Convert all proposals to a final dict(json) format.

Parameters

results (list[dict]) – All proposals.
show_progress (bool) – Whether to show the progress bar. Defaults: False.

Returns

The final result dict. E.g.

dict(video-1=[dict(segment=[1.1,2.0]. score=0.9),
              dict(segment=[50.1, 129.3], score=0.6)])

Return type

dict

class mmaction.datasets.BaseDataset(ann_file, pipeline, data_prefix=None, test_mode=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]¶

Base class for datasets.

All datasets to process video should subclass it. All subclasses should overwrite:

Methods:load_annotations, supporting to load information from an

annotation file.

Methods:prepare_train_frames, providing train data.
Methods:prepare_test_frames, providing test data.

Parameters

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
multi_class (bool) – Determines whether the dataset is a multi-class dataset. Default: False.
num_classes (int) – Number of classes of the dataset, used in multi-class datasets. Default: None.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.

dump_results(results, out)[source]¶: Dump data to json/yaml/pickle strings or files.

abstract evaluate(results, metrics, logger)[source]¶

Evaluation for the dataset.

Parameters

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed.
logger (logging.Logger | None) – Logger for recording.

Returns

Evaluation results dict.

Return type

dict

abstract load_annotations()[source]¶: Load the annotation according to ann_file into video_infos.

load_json_annotations()[source]¶: Load json annotation file to get video information.

prepare_test_frames(idx)[source]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]¶: Prepare the frames for training given the index.

class mmaction.datasets.RawframeDataset(ann_file, pipeline, data_prefix=None, test_mode=False, filename_tmpl='img_{:05}.jpg', with_offset=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]¶

Rawframe dataset for action recognition.

The dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Example of a annotation file:

some/directory-1 163 1
some/directory-2 122 1
some/directory-3 258 2
some/directory-4 234 2
some/directory-5 295 3
some/directory-6 121 3

Example of a multi-class annotation file:

some/directory-1 163 1 3 5
some/directory-2 122 1 2
some/directory-3 258 2
some/directory-4 234 2 4 6 8
some/directory-5 295 3
some/directory-6 121 3

Example of a with_offset annotation file (clips from long videos), each line indicates the directory to frames of a video, the index of the start frame, total frames of the video clip and the label of a video clip, which are split with a whitespace.

some/directory-1 12 163 3
some/directory-2 213 122 4
some/directory-3 100 258 5
some/directory-4 98 234 2
some/directory-5 0 295 3
some/directory-6 50 121 3

Parameters

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
with_offset (bool) – Determines whether the offset information is in ann_file. Default: False.
multi_class (bool) – Determines whether it is a multi-class recognition dataset. Default: False.
num_classes (int) – Number of classes in the dataset. Default: None.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.

evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]¶

Evaluation in rawframe dataset.

Parameters

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
logger (logging.Logger | None) – Training logger. Defaults: None.
topk (int | tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).
logger – Logger for recording. Default: None.

Returns

Evaluation results dict.

Return type

dict

load_annotations()[source]¶: Load annotation file to get video information.

prepare_test_frames(idx)[source]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]¶: Prepare the frames for training given the index.

class mmaction.datasets.RepeatDataset(dataset, times)[source]¶

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

Parameters

dataset (Dataset) – The dataset to be repeated.
times (int) – Repeat times.

class mmaction.datasets.SSNDataset(ann_file, pipeline, train_cfg, test_cfg, data_prefix, test_mode=False, filename_tmpl='img_{:05d}.jpg', start_index=1, modality='RGB', video_centric=True, reg_normalize_constants=None, body_segments=5, aug_segments=(2, 2), aug_ratio=(0.5, 0.5), clip_len=1, frame_interval=1, filter_gt=True, use_regression=True, verbose=False)[source]¶

Proposal frame dataset for Structured Segment Networks.

Based on proposal information, the dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines and each video’s information takes up several lines. This file can be a normalized file with percent or standard file with specific frame indexes. If the file is a normalized file, it will be converted into a standard file first.

Template information of a video in a standard file: .. code-block:: txt

# index video_id num_frames fps num_gts label, start_frame, end_frame label, start_frame, end_frame … num_proposals label, best_iou, overlap_self, start_frame, end_frame label, best_iou, overlap_self, start_frame, end_frame …

Example of a standard annotation file: .. code-block:: txt

# 0 video_validation_0000202 5666 1 3 8 130 185 8 832 1136 8 1303 1381 5 8 0.0620 0.0620 790 5671 8 0.1656 0.1656 790 2619 8 0.0833 0.0833 3945 5671 8 0.0960 0.0960 4173 5671 8 0.0614 0.0614 3327 5671

Parameters

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
data_prefix (str) – Path to a directory where videos are held.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
start_index (int) – Specify a start index for frames in consideration of different filename format. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
video_centric (bool) – Whether to sample proposals just from this video or sample proposals randomly from the entire dataset. Default: True.
reg_normalize_constants (list) – Regression target normalized constants, including mean and standard deviation of location and duration.
body_segments (int) – Number of segments in course period. Default: 5.
aug_segments (list[int]) – Number of segments in starting and ending period. Default: (2, 2).
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal. Defualt: (0.5, 0.5).
clip_len (int) – Frames of each sampled output clip. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
filter_gt (bool) – Whether to filter videos with no annotation during training. Default: True.
use_regression (bool) – Whether to perform regression. Default: True.
verbose (bool) – Whether to print full information or not. Default: False.

construct_proposal_pools()[source]¶: Construct positve proposal pool, incomplete proposal pool and background proposal pool of the entire dataset.

evaluate(results, metrics='mAP', eval_dataset='thumos14', **kwargs)[source]¶

Evaluation in SSN proposal dataset.

Parameters

results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘mAP’.
eval_dataset (str) – Dataset to be evaluated.

Returns

Evaluation results for evaluation metrics.

Return type

dict

get_all_gts()[source]¶: Fetch groundtruth instances of the entire dataset.

get_negatives(proposals, incomplete_iou_threshold, background_iou_threshold, background_coverage_threshold=0.01, incomplete_overlap_threshold=0.7)[source]¶

Get negative proposals, including incomplete proposals and background proposals.

Parameters

proposals (list) – List of proposal instances(SSNInstance).
incomplete_iou_threshold (float) – Maximum threshold of overlap of incomplete proposals and groundtruths.
background_iou_threshold (float) – Maximum threshold of overlap of background proposals and groundtruths.
background_coverage_threshold (float) – Minimum coverage of background proposals in video duration. Default: 0.01.
incomplete_overlap_threshold (float) – Minimum percent of incomplete proposals’ own span contained in a groundtruth instance. Default: 0.7.

Returns

(incompletes, backgrounds), incompletes: and backgrounds are lists comprised of incomplete proposal instances and background proposal instances.

Return type

list[SSNInstance]

get_positives(gts, proposals, positive_threshold, with_gt=True)[source]¶

Get positive/foreground proposals.

Parameters

gts (list) – List of groundtruth instances(SSNInstance).
proposals (list) – List of proposal instances(SSNInstance).
positive_threshold (float) – Minimum threshold of overlap of positive/foreground proposals and groundtruths.
with_gt (bool) – Whether to include groundtruth instances in positive proposals. Default: True.

Returns

(positives), positives is a list: comprised of positive proposal instances.

Return type

list[SSNInstance]

load_annotations()[source]¶: Load annotation file to get video information.

prepare_test_frames(idx)[source]¶: Prepare the frames for testing given the index.

prepare_train_frames(idx)[source]¶: Prepare the frames for training given the index.

results_to_detections(results, top_k=2000, softmax_before_filter=True, cls_top_k=2, **kwargs)[source]¶

Convert prediction results into detections.

Parameters

results (list) – Prediction results.
top_k (int) – Number of top results. Default: 2000.
softmax_before_filter (bool) – Whether to perform softmax operations before filtering results. Default: True.
cls_top_k (int) – Number of top results for each class. Default: 2.

Returns

Detection results.

Return type

list

class mmaction.datasets.VideoDataset(ann_file, pipeline, start_index=0, **kwargs)[source]¶

Video dataset for action recognition.

The dataset loads raw videos and apply specified transforms to return a dict containing the frame tensors and other information.

The ann_file is a text file with multiple lines, and each line indicates a sample video with the filepath and label, which are split with a whitespace. Example of a annotation file:

some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3

Parameters

ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 0.
**kwargs – Keyword arguments for BaseDataset.

evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]¶

Evaluation in rawframe dataset.

Parameters

results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
logger (logging.Logger | None) – Training logger. Defaults: None.
topk (tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).
logger – Logger for recording. Default: None.

Returns

Evaluation results dict.

Return type

dict

load_annotations()[source]¶: Load annotation file to get video information.

mmaction.datasets.build_dataloader(dataset, videos_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[source]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters

dataset (Dataset) – A PyTorch dataset.
videos_per_gpu (int) – Number of videos on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int | None) – Seed to be used. Default: None.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.

Returns

A PyTorch dataloader.

Return type

DataLoader

mmaction.datasets.build_dataset(cfg, default_args=None)[source]¶

Build a dataset from config dict.

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.

Returns

The constructed dataset.

Return type

Dataset

pipelines¶

class mmaction.datasets.pipelines.CenterCrop(crop_size, lazy=False)[source]¶

Crop the center area from images.

Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “lazy” and “img_shape”. Required keys in “lazy” is “crop_bbox”, added or modified key is “crop_bbox”.

Parameters

crop_size (int | tuple[int]) – (w, h) of crop size.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Collect(keys, meta_keys=('filename', 'label', 'original_shape', 'img_shape', 'pad_shape', 'flip_direction', 'img_norm_cfg'), meta_name='img_meta')[source]¶

Collect data from the loader relevant to the specific task.

This keeps the items in keys as it is, and collect items in meta_keys into a meta item called meta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_meta’, the results will be a dict with keys ‘imgs’ and ‘img_meta’, where ‘img_meta’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.

Parameters

keys (Sequence[str]) – Required keys to be collected.
meta_name (str) – The name of the key that contains meta infomation. This key is always populated. Default: “img_meta”.
meta_keys (Sequence[str]) –
Keys that are collected under meta_name. The contents of the meta_name dictionary depends on meta_keys. By default this includes:
- ”filename”: path to the image file
- ”label”: label of the image file
- ”original_shape”: original shape of the image as a tuple
(h, w, c)
- ”img_shape”: shape of the image input to the network as a tuple
(h, w, c). Note that images may be zero padded on the bottom/right, if the batch tensor is larger than this shape.
- ”pad_shape”: image shape after padding
- ”flip_direction”: a str in (“horiziontal”, “vertival”) to
indicate if the image is fliped horizontally or vertically.
- ”img_norm_cfg”: a dict of normalization information:
  - mean - per channel mean subtraction
  - std - per channel std divisor
  - to_rgb - bool indicating if bgr was converted to rgb

class mmaction.datasets.pipelines.Compose(transforms)[source]¶

Compose a data pipeline with a sequence of transforms.

Parameters: transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmaction.datasets.pipelines.DecordDecode(**kwargs)[source]¶

Using decord to decode the video.

Decord: https://github.com/dmlc/decord

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs” and “original_shape”.

class mmaction.datasets.pipelines.DecordInit(io_backend='disk', num_threads=1, **kwargs)[source]¶

Using decord to initialize the video_reader.

Decord: https://github.com/dmlc/decord

Required keys are “filename”, added or modified keys are “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.DenseSampleFrames(clip_len, frame_interval=1, num_clips=1, sample_range=64, num_sample_positions=10, temporal_jitter=False, out_of_bound_opt='loop', test_mode=False)[source]¶

Select frames from the video by dense sample strategy.

Required keys are “filename”, added or modified keys are “total_frames”, “frame_inds”, “frame_interval” and “num_clips”.

Parameters

clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
sample_range (int) – Total sample range for dense sample. Default: 64.
num_sample_positions (int) – Number of sample start positions, Which is only used in test mode. Default: 10.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
test_mode (bool) – Store True when building test or validation dataset. Default: False.

class mmaction.datasets.pipelines.Flip(flip_ratio=0.5, direction='horizontal', lazy=False)[source]¶

Flip the input images with a probability.

Reverse the order of elements in the given imgs with a specific direction. The shape of the imgs is preserved, but the elements are reordered. Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “lazy” and “flip_direction”. Required keys in “lazy” is None, added or modified key are “flip” and “flip_direction”.

Parameters

flip_ratio (float) – Probability of implementing flip. Default: 0.5.
direction (str) – Flip imgs horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.FormatShape(input_format)[source]¶

Format final imgs shape to the given input_format.

Required keys are “imgs”, “num_clips” and “clip_len”, added or modified keys are “imgs” and “input_shape”.

Parameters: input_format (str) – Define the final imgs format.

class mmaction.datasets.pipelines.FrameSelector(*args, **kwargs)[source]¶: Deprecated class for RawFrameDecode.

class mmaction.datasets.pipelines.Fuse[source]¶

Fuse lazy operations.

Fusion order:: crop -> resize -> flip

Required keys are “imgs”, “img_shape” and “lazy”, added or modified keys are “imgs”, “lazy”. Required keys in “lazy” are “crop_bbox”, “interpolation”, “flip_direction”.

class mmaction.datasets.pipelines.GenerateLocalizationLabels[source]¶

Load video label for localizer with given video_name list.

Required keys are “duration_frame”, “duration_second”, “feature_frame”, “annotations”, added or modified keys are “gt_bbox”.

class mmaction.datasets.pipelines.ImageToTensor(keys)[source]¶

Convert image type to torch.Tensor type.

Parameters: keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.LoadLocalizationFeature(raw_feature_ext='.csv')[source]¶

Load Video features for localizer with given video_name list.

Required keys are “video_name” and “data_prefix”, added or modified keys are “raw_feature”.

Parameters: raw_feature_ext (str) – Raw feature file extension. Default: ‘.csv’.

class mmaction.datasets.pipelines.LoadProposals(top_k, pgm_proposals_dir, pgm_features_dir, proposal_ext='.csv', feature_ext='.npy')[source]¶

Loading proposals with given proposal results.

Required keys are “video_name” added or modified keys are ‘bsp_feature’, ‘tmin’, ‘tmax’, ‘tmin_score’, ‘tmax_score’ and ‘reference_temporal_iou’.

Parameters

top_k (int) – The top k proposals to be loaded.
pgm_proposals_dir (str) – Directory to load proposals.
pgm_features_dir (str) – Directory to load proposal features.
proposal_ext (str) – Proposal file extension. Default: ‘.csv’.
feature_ext (str) – Feature file extension. Default: ‘.npy’.

class mmaction.datasets.pipelines.MultiGroupCrop(crop_size, groups)[source]¶

Randomly crop the images into several groups.

Crop the random region with the same given crop_size and bounding box into several groups. Required keys are “imgs”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters

crop_size (int | tuple[int]) – (w, h) of crop size.
groups (int) – Number of groups.

class mmaction.datasets.pipelines.MultiScaleCrop(input_size, scales=(1), max_wh_scale_gap=1, random_crop=False, num_fixed_crops=5, lazy=False)[source]¶

Crop images with a list of randomly selected scales.

Randomly select the w and h scales from a list of scales. Scale of 1 means the base size, which is the minimal of image weight and height. The scale level of w and h is controlled to be smaller than a certain value to prevent too large or small aspect ratio. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “img_shape”, “lazy” and “scales”. Required keys in “lazy” are “crop_bbox”, added or modified key is “crop_bbox”.

Parameters

input_size (int | tuple[int]) – (w, h) of network input.
scales (tuple[float]) – Weight and height scales to be selected.
max_wh_scale_gap (int) – Maximum gap of w and h scale levels. Default: 1.
random_crop (bool) – If set to True, the cropping bbox will be randomly sampled, otherwise it will be sampler from fixed regions. Default: False.
num_fixed_crops (int) – If set to 5, the cropping bbox will keep 5 basic fixed regions: “upper left”, “upper right”, “lower left”, “lower right”, “center”.If set to 13, the cropping bbox will append another 8 fix regions: “center left”, “center right”, “lower center”, “upper center”, “upper left quarter”, “upper right quarter”, “lower left quarter”, “lower right quarter”. Default: 5.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.Normalize(mean, std, to_bgr=False, adjust_magnitude=False)[source]¶

Normalize images with the given mean and std value.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs” and “img_norm_cfg”. If modality is ‘Flow’, additional keys “scale_factor” is required

Parameters

mean (Sequence[float]) – Mean values of different channels.
std (Sequence[float]) – Std values of different channels.
to_bgr (bool) – Whether to convert channels from RGB to BGR. Default: False.
adjust_magnitude (bool) – Indicate whether to adjust the flow magnitude on ‘scale_factor’ when modality is ‘Flow’. Default: False.

class mmaction.datasets.pipelines.OpenCVDecode[source]¶

Using OpenCV to decode the video.

Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

class mmaction.datasets.pipelines.OpenCVInit(io_backend='disk', **kwargs)[source]¶

Using OpenCV to initalize the video_reader.

Required keys are “filename”, added or modified keys are “new_path”, “video_reader” and “total_frames”.

class mmaction.datasets.pipelines.PyAVDecode(multi_thread=False)[source]¶

Using pyav to decode the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “video_reader” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

Parameters: multi_thread (bool) – If set to True, it will apply multi thread processing. Default: False.

class mmaction.datasets.pipelines.PyAVInit(io_backend='disk', **kwargs)[source]¶

Using pyav to initialize the video.

PyAV: https://github.com/mikeboers/PyAV

Required keys are “filename”, added or modified keys are “video_reader”, and “total_frames”.

Parameters

io_backend (str) – io backend where frames are store. Default: ‘disk’.
kwargs (dict) – Args for file client.

class mmaction.datasets.pipelines.RandomCrop(size, lazy=False)[source]¶

Vanilla square random crop that specifics the output size.

Required keys in results are “imgs” and “img_shape”, added or modified keys are “imgs”, “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

Parameters

size (int) – The output size of the images.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.RandomResizedCrop(area_range=(0.08, 1.0), aspect_ratio_range=(0.75, 1.3333333333333333), lazy=False)[source]¶

Random crop that specifics the area and height-weight ratio range.

Required keys in results are “imgs”, “img_shape”, “crop_bbox” and “lazy”, added or modified keys are “imgs”, “crop_bbox” and “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.

Parameters

area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3).
lazy (bool) – Determine whether to apply lazy operation. Default: False.

static get_crop_bbox(img_shape, area_range, aspect_ratio_range, max_attempts=10)[source]¶

Get a crop bbox given the area range and aspect ratio range.

Parameters

img_shape (Tuple[int]) – Image shape
area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3). max_attempts (int): The maximum of attempts. Default: 10.
max_attempts (int) – Max attempts times to generate random candidate bounding box. If it doesn’t qualified one, the center bounding box will be used.

Returns

(list[int]) A random crop bbox within the area range and aspect ratio range.

class mmaction.datasets.pipelines.RawFrameDecode(io_backend='disk', decoding_backend='cv2', **kwargs)[source]¶

Load and decode frames with given indices.

Required keys are “frame_dir”, “filename_tmpl” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.

Parameters

io_backend (str) – IO backend where frames are stored. Default: ‘disk’.
decoding_backend (str) – Backend used for image decoding. Default: ‘cv2’.
kwargs (dict, optional) – Arguments for FileClient.

class mmaction.datasets.pipelines.Resize(scale, keep_ratio=True, interpolation='bilinear', lazy=False)[source]¶

Resize images to a specific size.

Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “lazy”, “resize_size”. Required keys in “lazy” is None, added or modified key is “interpolation”.

Parameters

scale (float | Tuple[int]) – If keep_ratio is True, it serves as scaling factor or maximum size: If it is a float number, the image will be rescaled by this factor, else if it is a tuple of 2 integers, the image will be rescaled as large as possible within the scale. Otherwise, it serves as (w, h) of output size.
keep_ratio (bool) – If set to True, Images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: True.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear”. Default: “bilinear”.
lazy (bool) – Determine whether to apply lazy operation. Default: False.

class mmaction.datasets.pipelines.SampleFrames(clip_len, frame_interval=1, num_clips=1, temporal_jitter=False, twice_sample=False, out_of_bound_opt='loop', test_mode=False, start_index=None)[source]¶

Sample frames from the video.

Required keys are “filename”, “total_frames”, “start_index” , added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

Parameters

clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
twice_sample (bool) – Whether to use twice sample when testing. If set to True, it will sample frames with and without fixed shift, which is commonly used for testing in TSM model. Default: False.
out_of_bound_opt (str) – The way to deal with out of bounds frame indexes. Available options are ‘loop’, ‘repeat_last’. Default: ‘loop’.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
start_index (None) – This argument is deprecated and moved to dataset class (BaseDataset, VideoDatset, RawframeDataset, etc), see this: https://github.com/open-mmlab/mmaction2/pull/89.

class mmaction.datasets.pipelines.SampleProposalFrames(clip_len, body_segments, aug_segments, aug_ratio, frame_interval=1, test_interval=6, temporal_jitter=False, mode='train')[source]¶

Sample frames from proposals in the video.

Required keys are “total_frames” and “out_proposals”, added or modified keys are “frame_inds”, “frame_interval”, “num_clips”, ‘clip_len’ and ‘num_proposals’.

Parameters

clip_len (int) – Frames of each sampled output clip.
body_segments (int) – Number of segments in course period.
aug_segments (list[int]) – Number of segments in starting and ending period.
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
test_interval (int) – Temporal interval of adjacent sampled frames in test mode. Default: 6.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
mode (str) – Choose ‘train’, ‘val’ or ‘test’ mode. Default: ‘train’.

class mmaction.datasets.pipelines.TenCrop(crop_size)[source]¶

Crop the images into 10 crops (corner + center + flip).

Crop the four corners and the center part of the image with the same given crop_size, and flip it horizontally. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters: crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ThreeCrop(crop_size)[source]¶

Crop images into three crops.

Crop the images equally into three crops with equal intervals along the shorter side. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.

Parameters: crop_size (int | tuple[int]) – (w, h) of crop size.

class mmaction.datasets.pipelines.ToDataContainer(fields)[source]¶

Convert the data to DataContainer.

Parameters: fields (Sequence[dict]) – Required fields to be converted with keys and attributes. E.g. fields=(dict(key=’gt_bbox’, stack=False),).

class mmaction.datasets.pipelines.ToTensor(keys)[source]¶

Convert some values in results dict to torch.Tensor type in data loader pipeline.

Parameters: keys (Sequence[str]) – Required keys to be converted.

class mmaction.datasets.pipelines.Transpose(keys, order)[source]¶

Transpose image channels to a given order.

Parameters

keys (Sequence[str]) – Required keys to be converted.
order (Sequence[int]) – Image channel order.

class mmaction.datasets.pipelines.UntrimmedSampleFrames(clip_len=1, frame_interval=16, start_index=1)[source]¶

Sample frames from the untrimmed video.

Required keys are “filename”, “total_frames”, added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.

Parameters

clip_len (int) – The length of sampled clips. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 16.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.

samplers¶

class mmaction.datasets.samplers.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True)[source]¶

DistributedSampler inheriting from torch.utils.data.DistributedSampler.

In pytorch of lower versions, there is no shuffle argument. This child class will port one to DistributedSampler.

mmaction.utils¶

mmaction.utils.get_random_string(length=15)[source]¶

Get random string with letters and digits.

Parameters: length (int) – Length of random string. Default: 15.

mmaction.utils.get_root_logger(log_file=None, log_level=20)[source]¶

Use get_logger method in mmcv to get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmaction”.

Parameters

log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

Returns

The root logger.

Return type

logging.Logger

mmaction.utils.get_shm_dir()[source]¶: Get shm dir for temporary usage.

mmaction.utils.get_thread_id()[source]¶: Get current thread id.

mmaction.localization¶

mmaction.localization.eval_ap(detections, gt_by_cls, iou_range)[source]¶

Evaluate average precisions.

Parameters

detections (dict) – Results of detections.
gt_by_cls (dict) – Information of groudtruth.
iou_range (list) – Ranges of iou.

Returns

Average precision values of classes at ious.

Return type

list

mmaction.localization.generate_bsp_feature(video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=1000, bsp_boundary_ratio=0.2, num_sample_start=8, num_sample_end=8, num_sample_action=16, num_sample_interp=3, tem_results_ext='.csv', pgm_proposal_ext='.csv', result_dict=None)[source]¶

Generate Boundary-Sensitive Proposal Feature with given proposals.

Parameters

video_list (list[int]) – List of video indexs to generate bsp_feature.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’.
tem_results_dir (str) – Directory to load temporal evaluation results.
pgm_proposals_dir (str) – Directory to load proposals.
top_k (int) – Number of proposals to be considered. Default: 1000
bsp_boundary_ratio (float) – Ratio for proposal boundary (start/end). Default: 0.2.
num_sample_start (int) – Num of samples for actionness in start region. Default: 8.
num_sample_end (int) – Num of samples for actionness in end region. Default: 8.
num_sample_action (int) – Num of samples for actionness in center region. Default: 16.
num_sample_interp (int) – Num of samples for interpolation for each sample point. Default: 3.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
pgm_proposal_ext (str) – File extension for proposals. Default: ‘.csv’.
result_dict (dict) – The dict to save the results. Default: None.

Returns

A dict contains video_name as keys and bsp_feature as value. If result_dict is not None, save the results to it.

Return type

bsp_feature_dict (dict)

mmaction.localization.generate_candidate_proposals(video_list, video_infos, tem_results_dir, temporal_scale, peak_threshold, tem_results_ext='.csv', result_dict=None)[source]¶

Generate Candidate Proposals with given temporal evalutation results. Each proposal file will contain: ‘tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa’.

Parameters

video_list (list[int]) – List of video indexs to generate proposals.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’, ‘duration_frame’, ‘duration_second’, ‘feature_frame’, and ‘annotations’.
tem_results_dir (str) – Directory to load temporal evaluation results.
temporal_scale (int) – The number (scale) on temporal axis.
peak_threshold (float) – The threshold for proposal generation.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
result_dict (dict) – The dict to save the results. Default: None.

Returns

A dict contains video_name as keys and proposal list as value. If result_dict is not None, save the results to it.

Return type

dict

mmaction.localization.load_localize_proposal_file(filename)[source]¶

Load the proposal file and split it into many parts which contain one video’s information separately.

Parameters: filename (str) – Path to the proposal file.
Returns: List of all videos’ information.
Return type: list

mmaction.localization.perform_regression(detections)[source]¶

Perform regression on detection results.

Parameters: detections (list) – Detection results before regression.
Returns: Detection results after regression.
Return type: list

mmaction.localization.soft_nms(proposals, alpha, low_threshold, high_threshold, top_k)[source]¶

Soft NMS for temporal proposals.

Parameters

proposals (np.ndarray) – Proposals generated by network.
alpha (float) – Alpha value of Gaussian decaying function.
low_threshold (float) – Low threshold for soft nms.
high_threshold (float) – High threshold for soft nms.
top_k (int) – Top k values to be considered.

Returns

The updated proposals.

Return type

new_proposals (np.ndarray)

mmaction.localization.temporal_iop(proposal_min, proposal_max, gt_min, gt_max)[source]¶

Compute IoP score between a groundtruth bbox and the proposals.

Compute the IoP which is defined as the overlap ratio with groundtruth proportional to the duration of this proposal.

Parameters

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

Returns

List of intersection over anchor scores.

Return type

scores (list[float])

mmaction.localization.temporal_iou(proposal_min, proposal_max, gt_min, gt_max)[source]¶

Compute IoU score between a groundtruth bbox and the proposals.

Parameters

proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.

Returns

List of iou scores.

Return type

jaccard (list[float])

mmaction.localization.temporal_nms(detections, threshold)[source]¶

Parse the video’s information.

Parameters

detections (list) – Detection results before NMS.
threshold (float) – Threshold of NMS.

Returns

Detection results after NMS.

Return type

list