API Reference¶
mmaction.core¶
optimizer¶
-
class
mmaction.core.optimizer.CopyOfSGD(params, lr=<required parameter>, momentum=0, dampening=0, weight_decay=0, nesterov=False)[source]¶ A clone of torch.optim.SGD.
A customized optimizer could be defined like CopyOfSGD. You may derive from built-in optimizers in torch.optim, or directly implement a new optimizer.
-
class
mmaction.core.optimizer.TSMOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]¶ Optimizer constructor in TSM model.
This constructor builds optimizer in different ways from the default one.
Parameters of the first conv layer have default lr and weight decay.
Parameters of BN layers have default lr and zero weight decay.
If the field “fc_lr5” in paramwise_cfg is set to True, the parameters of the last fc layer in cls_head have 5x lr multiplier and 10x weight decay multiplier.
Weights of other layers have default lr and weight decay, and biases have a 2x lr multiplier and zero weight decay.
evaluation¶
-
class
mmaction.core.evaluation.DistEvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]¶ Distributed evaluation hook.
This hook will regularly perform evaluation in a given interval when performing in distributed environment.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.
key_indicator (str | None) – Key indicator to measure the best checkpoint during evaluation when
save_bestis set to True. Options are the evaluation metrics to the test dataset. e.g.,top1_acc,top5_acc,mean_class_accuracy,mean_average_precisionfor action recognition dataset (RawframeDataset and VideoDataset).AR@AN,aucfor action localization dataset (ActivityNetDataset). Default: top1_acc.rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.
eval_kwargs (dict, optional) – Arguments for evaluation.
-
class
mmaction.core.evaluation.EvalHook(dataloader, interval=1, gpu_collect=False, save_best=True, key_indicator='top1_acc', rule=None, **eval_kwargs)[source]¶ Non-Distributed evaluation hook.
This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval (by epochs). Default: 1.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
save_best (bool) – Whether to save best checkpoint during evaluation. Default: True.
key_indicator (str | None) –
Key indicator to measure the best checkpoint during evaluation when
save_bestis set to True. Options are the evaluation metrics to the test dataset. e.g.,top1_acc,top5_acc,mean_class_accuracy,mean_average_precisionfor action recognition dataset (RawframeDataset and VideoDataset).AR@AN,aucfor action localization dataset (ActivityNetDataset). Default: top1_acc.rule (str | None) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Default: ‘None’.
eval_kwargs (dict, optional) – Arguments for evaluation.
-
mmaction.core.evaluation.average_precision_at_temporal_iou(ground_truth, prediction, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]¶ Compute average precision (in detection task) between ground truth and predicted data frames. If multiple predictions match the same predicted segment, only the one with highest score is matched as true positive. This code is greatly inspired by Pascal VOC devkit.
- Parameters
ground_truth (dict) – Dict containing the ground truth instances. Key: ‘video_id’ Value (np.ndarry): 1D array of ‘t-start’ and ‘t-end’.
proposals (np.ndarray) – 2D array containing the information of proposal instances, including ‘video_id’, ‘class_id’, ‘t-start’, ‘t-end’ and ‘score’.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).
- Returns
1D array of average precision score.
- Return type
np.ndarray
-
mmaction.core.evaluation.average_recall_at_avg_proposals(ground_truth, proposals, total_num_proposals, max_avg_proposals=None, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]))[source]¶ Computes the average recall given an average number (percentile) of proposals per video.
- Parameters
ground_truth (dict) – Dict containing the ground truth instances.
proposals (dict) – Dict containing the proposal instances.
total_num_proposals (int) – Total number of proposals in the proposal dict.
max_avg_proposals (int | None) – Max number of proposals for one video. Default: None.
temporal_iou_thresholds (np.ndarray) – 1D array with temporal_iou thresholds. Default: np.linspace(0.5, 0.95, 10).
- Returns
(recall, average_recall, proposals_per_video, auc) In recall,
recall[i,j]is recall at i-th temporal_iou threshold at the j-th average number (percentile) of average number of proposals per video. The average_recall is recall averaged over a list of temporal_iou threshold (1D array). This is equivalent torecall.mean(axis=0). Theproposals_per_videois the average number of proposals per video. The auc is the area under AR@AN curve.- Return type
tuple([np.ndarray, np.ndarray, np.ndarray, float])
-
mmaction.core.evaluation.confusion_matrix(y_pred, y_real, normalize=None)[source]¶ Compute confusion matrix.
- Parameters
y_pred (list[int] | np.ndarray[int]) – Prediction labels.
y_real (list[int] | np.ndarray[int]) – Ground truth labels.
normalize (str | None) – Normalizes confusion matrix over the true (rows), predicted (columns) conditions or all the population. If None, confusion matrix will not be normalized. Options are “true”, “pred”, “all”, None. Default: None.
- Returns
Confusion matrix.
- Return type
np.ndarray
-
mmaction.core.evaluation.get_weighted_score(score_list, coeff_list)[source]¶ Get weighted score with given scores and coefficients.
Given n predictions by different classifier: [score_1, score_2, …, score_n] (score_list) and their coefficients: [coeff_1, coeff_2, …, coeff_n] (coeff_list), return weighted score: weighted_score = score_1 * coeff_1 + score_2 * coeff_2 + … + score_n * coeff_n
- Parameters
score_list (list[list[np.ndarray]]) – List of list of scores, with shape n(number of predictions) X num_samples X num_classes
coeff_list (list[float]) – List of coefficients, with shape n.
- Returns
List of weighted scores.
- Return type
list[np.ndarray]
-
mmaction.core.evaluation.mean_average_precision(scores, labels)[source]¶ Mean average precision for multi-label recognition.
- Parameters
scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[np.ndarray]) – Ground truth many-hot vector.
- Returns
The mean average precision.
- Return type
np.float
-
mmaction.core.evaluation.mean_class_accuracy(scores, labels)[source]¶ Calculate mean class accuracy.
- Parameters
scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.
- Returns
Mean class accuracy.
- Return type
np.ndarray
-
mmaction.core.evaluation.pairwise_temporal_iou(candidate_segments, target_segments)[source]¶ Compute intersection over union between segments.
- Parameters
candidate_segments (np.ndarray) – 1-dim/2-dim array in format [init, end]/[m x 2:=[init, end]].
target_segments (np.ndarray) – 2-dim array in format [n x 2:=[init, end]].
- Returns
- 1-dim array [n] /
2-dim array [n x m] with IoU ratio.
- Return type
t_iou (np.ndarray)
-
mmaction.core.evaluation.softmax(x, dim=1)[source]¶ Compute softmax values for each sets of scores in x.
-
mmaction.core.evaluation.top_k_accuracy(scores, labels, topk=(1))[source]¶ Calculate top k accuracy score.
- Parameters
scores (list[np.ndarray]) – Prediction scores for each class.
labels (list[int]) – Ground truth labels.
topk (tuple[int]) – K value for top_k_accuracy. Default: (1, ).
- Returns
Top k accuracy score for each k.
- Return type
list[float]
fp16¶
dist utils¶
-
mmaction.core.dist_utils.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[source]¶ Allreduce gradients.
- Parameters
params (list[torch.Parameters]) – List of parameters of a model
coalesce (bool, optional) – Whether allreduce parameters as a whole. Default: True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Default: -1.
mmaction.datasets¶
datasets¶
-
class
mmaction.datasets.ActivityNetDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶ ActivityNet dataset for temporal action localization.
The dataset loads raw features and apply specified transforms to return a dict containing the frame tensors and other information.
The ann_file is a json file with multiple objects, and each object has a key of the name of a video, and value of total frames of the video, total seconds of the video, annotations of a video, feature frames (frames covered by features) of the video, fps and rfps. Example of a annotation file:
{ "v_--1DO2V4K74": { "duration_second": 211.53, "duration_frame": 6337, "annotations": [ { "segment": [ 30.025882995319815, 205.2318595943838 ], "label": "Rock climbing" } ], "feature_frame": 6336, "fps": 30.0, "rfps": 29.9579255898 }, "v_--6bJUbfpnQ": { "duration_second": 26.75, "duration_frame": 647, "annotations": [ { "segment": [ 2.578755070202808, 24.914101404056165 ], "label": "Drinking beer" } ], "feature_frame": 624, "fps": 24.0, "rfps": 24.1869158879 }, ... }- Parameters
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
-
dump_results(results, out, output_format, version='VERSION 1.3')[source]¶ Dump data to json/csv files.
-
evaluate(results, metrics='AR@AN', max_avg_proposals=100, temporal_iou_thresholds=array([0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95]), logger=None)[source]¶ Evaluation in feature dataset.
- Parameters
results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘AR@AN’.
max_avg_proposals (int) – Max number of proposals to evaluate. Defaults: 100.
temporal_iou_thresholds (list) – Temporal IoU threshold for positive samples. Defaults: np.linspace(0.5, 0.95, 10).
logger (logging.Logger | None) – Training logger. Defaults: None.
- Returns
Evaluation results for evaluation metrics.
- Return type
dict
-
proposals2json(results, show_progress=False)[source]¶ Convert all proposals to a final dict(json) format.
- Parameters
results (list[dict]) – All proposals.
show_progress (bool) – Whether to show the progress bar. Defaults: False.
- Returns
The final result dict. E.g.
dict(video-1=[dict(segment=[1.1,2.0]. score=0.9), dict(segment=[50.1, 129.3], score=0.6)])
- Return type
dict
-
class
mmaction.datasets.BaseDataset(ann_file, pipeline, data_prefix=None, test_mode=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]¶ Base class for datasets.
All datasets to process video should subclass it. All subclasses should overwrite:
Methods:load_annotations, supporting to load information from an
annotation file.
Methods:prepare_train_frames, providing train data.
Methods:prepare_test_frames, providing test data.
- Parameters
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
multi_class (bool) – Determines whether the dataset is a multi-class dataset. Default: False.
num_classes (int) – Number of classes of the dataset, used in multi-class datasets. Default: None.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
-
class
mmaction.datasets.RawframeDataset(ann_file, pipeline, data_prefix=None, test_mode=False, filename_tmpl='img_{:05}.jpg', with_offset=False, multi_class=False, num_classes=None, start_index=1, modality='RGB')[source]¶ Rawframe dataset for action recognition.
The dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.
The ann_file is a text file with multiple lines, and each line indicates the directory to frames of a video, total frames of the video and the label of a video, which are split with a whitespace. Example of a annotation file:
some/directory-1 163 1 some/directory-2 122 1 some/directory-3 258 2 some/directory-4 234 2 some/directory-5 295 3 some/directory-6 121 3
Example of a multi-class annotation file:
some/directory-1 163 1 3 5 some/directory-2 122 1 2 some/directory-3 258 2 some/directory-4 234 2 4 6 8 some/directory-5 295 3 some/directory-6 121 3
Example of a with_offset annotation file (clips from long videos), each line indicates the directory to frames of a video, the index of the start frame, total frames of the video clip and the label of a video clip, which are split with a whitespace.
some/directory-1 12 163 3 some/directory-2 213 122 4 some/directory-3 100 258 5 some/directory-4 98 234 2 some/directory-5 0 295 3 some/directory-6 50 121 3
- Parameters
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
data_prefix (str) – Path to a directory where videos are held. Default: None.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
with_offset (bool) – Determines whether the offset information is in ann_file. Default: False.
multi_class (bool) – Determines whether it is a multi-class recognition dataset. Default: False.
num_classes (int) – Number of classes in the dataset. Default: None.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
-
evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]¶ Evaluation in rawframe dataset.
- Parameters
results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
logger (logging.Logger | None) – Training logger. Defaults: None.
topk (int | tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).
logger – Logger for recording. Default: None.
- Returns
Evaluation results dict.
- Return type
dict
-
class
mmaction.datasets.RepeatDataset(dataset, times)[source]¶ A wrapper of repeated dataset.
The length of repeated dataset will be
timeslarger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.- Parameters
dataset (
Dataset) – The dataset to be repeated.times (int) – Repeat times.
-
class
mmaction.datasets.SSNDataset(ann_file, pipeline, train_cfg, test_cfg, data_prefix, test_mode=False, filename_tmpl='img_{:05d}.jpg', start_index=1, modality='RGB', video_centric=True, reg_normalize_constants=None, body_segments=5, aug_segments=(2, 2), aug_ratio=(0.5, 0.5), clip_len=1, frame_interval=1, filter_gt=True, use_regression=True, verbose=False)[source]¶ Proposal frame dataset for Structured Segment Networks.
Based on proposal information, the dataset loads raw frames and apply specified transforms to return a dict containing the frame tensors and other information.
The ann_file is a text file with multiple lines and each video’s information takes up several lines. This file can be a normalized file with percent or standard file with specific frame indexes. If the file is a normalized file, it will be converted into a standard file first.
Template information of a video in a standard file: .. code-block:: txt
# index video_id num_frames fps num_gts label, start_frame, end_frame label, start_frame, end_frame … num_proposals label, best_iou, overlap_self, start_frame, end_frame label, best_iou, overlap_self, start_frame, end_frame …
Example of a standard annotation file: .. code-block:: txt
# 0 video_validation_0000202 5666 1 3 8 130 185 8 832 1136 8 1303 1381 5 8 0.0620 0.0620 790 5671 8 0.1656 0.1656 790 2619 8 0.0833 0.0833 3945 5671 8 0.0960 0.0960 4173 5671 8 0.0614 0.0614 3327 5671
- Parameters
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
train_cfg (dict) – Config for training.
test_cfg (dict) – Config for testing.
data_prefix (str) – Path to a directory where videos are held.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
filename_tmpl (str) – Template for each filename. Default: ‘img_{:05}.jpg’.
start_index (int) – Specify a start index for frames in consideration of different filename format. Default: 1.
modality (str) – Modality of data. Support ‘RGB’, ‘Flow’. Default: ‘RGB’.
video_centric (bool) – Whether to sample proposals just from this video or sample proposals randomly from the entire dataset. Default: True.
reg_normalize_constants (list) – Regression target normalized constants, including mean and standard deviation of location and duration.
body_segments (int) – Number of segments in course period. Default: 5.
aug_segments (list[int]) – Number of segments in starting and ending period. Default: (2, 2).
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal. Defualt: (0.5, 0.5).
clip_len (int) – Frames of each sampled output clip. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
filter_gt (bool) – Whether to filter videos with no annotation during training. Default: True.
use_regression (bool) – Whether to perform regression. Default: True.
verbose (bool) – Whether to print full information or not. Default: False.
-
construct_proposal_pools()[source]¶ Construct positve proposal pool, incomplete proposal pool and background proposal pool of the entire dataset.
-
evaluate(results, metrics='mAP', eval_dataset='thumos14', **kwargs)[source]¶ Evaluation in SSN proposal dataset.
- Parameters
results (list[dict]) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘mAP’.
eval_dataset (str) – Dataset to be evaluated.
- Returns
Evaluation results for evaluation metrics.
- Return type
dict
-
get_negatives(proposals, incomplete_iou_threshold, background_iou_threshold, background_coverage_threshold=0.01, incomplete_overlap_threshold=0.7)[source]¶ Get negative proposals, including incomplete proposals and background proposals.
- Parameters
proposals (list) – List of proposal instances(
SSNInstance).incomplete_iou_threshold (float) – Maximum threshold of overlap of incomplete proposals and groundtruths.
background_iou_threshold (float) – Maximum threshold of overlap of background proposals and groundtruths.
background_coverage_threshold (float) – Minimum coverage of background proposals in video duration. Default: 0.01.
incomplete_overlap_threshold (float) – Minimum percent of incomplete proposals’ own span contained in a groundtruth instance. Default: 0.7.
- Returns
- (incompletes, backgrounds), incompletes
and backgrounds are lists comprised of incomplete proposal instances and background proposal instances.
- Return type
list[
SSNInstance]
-
get_positives(gts, proposals, positive_threshold, with_gt=True)[source]¶ Get positive/foreground proposals.
- Parameters
gts (list) – List of groundtruth instances(
SSNInstance).proposals (list) – List of proposal instances(
SSNInstance).positive_threshold (float) – Minimum threshold of overlap of positive/foreground proposals and groundtruths.
with_gt (bool) – Whether to include groundtruth instances in positive proposals. Default: True.
- Returns
- (positives), positives is a list
comprised of positive proposal instances.
- Return type
list[
SSNInstance]
-
results_to_detections(results, top_k=2000, softmax_before_filter=True, cls_top_k=2, **kwargs)[source]¶ Convert prediction results into detections.
- Parameters
results (list) – Prediction results.
top_k (int) – Number of top results. Default: 2000.
softmax_before_filter (bool) – Whether to perform softmax operations before filtering results. Default: True.
cls_top_k (int) – Number of top results for each class. Default: 2.
- Returns
Detection results.
- Return type
list
-
class
mmaction.datasets.VideoDataset(ann_file, pipeline, start_index=0, **kwargs)[source]¶ Video dataset for action recognition.
The dataset loads raw videos and apply specified transforms to return a dict containing the frame tensors and other information.
The ann_file is a text file with multiple lines, and each line indicates a sample video with the filepath and label, which are split with a whitespace. Example of a annotation file:
some/path/000.mp4 1 some/path/001.mp4 1 some/path/002.mp4 2 some/path/003.mp4 2 some/path/004.mp4 3 some/path/005.mp4 3
- Parameters
ann_file (str) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transforms.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 0.
**kwargs – Keyword arguments for
BaseDataset.
-
evaluate(results, metrics='top_k_accuracy', topk=(1, 5), logger=None)[source]¶ Evaluation in rawframe dataset.
- Parameters
results (list) – Output results.
metrics (str | sequence[str]) – Metrics to be performed. Defaults: ‘top_k_accuracy’.
logger (logging.Logger | None) – Training logger. Defaults: None.
topk (tuple[int]) – K value for top_k_accuracy metric. Defaults: (1, 5).
logger – Logger for recording. Default: None.
- Returns
Evaluation results dict.
- Return type
dict
-
mmaction.datasets.build_dataloader(dataset, videos_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[source]¶ Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (
Dataset) – A PyTorch dataset.videos_per_gpu (int) – Number of videos on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int | None) – Seed to be used. Default: None.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.
- Returns
A PyTorch dataloader.
- Return type
DataLoader
-
mmaction.datasets.build_dataset(cfg, default_args=None)[source]¶ Build a dataset from config dict.
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.
- Returns
The constructed dataset.
- Return type
Dataset
pipelines¶
-
class
mmaction.datasets.pipelines.CenterCrop(crop_size, lazy=False)[source]¶ Crop the center area from images.
Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “lazy” and “img_shape”. Required keys in “lazy” is “crop_bbox”, added or modified key is “crop_bbox”.
- Parameters
crop_size (int | tuple[int]) – (w, h) of crop size.
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
class
mmaction.datasets.pipelines.Collect(keys, meta_keys=('filename', 'label', 'original_shape', 'img_shape', 'pad_shape', 'flip_direction', 'img_norm_cfg'), meta_name='img_meta')[source]¶ Collect data from the loader relevant to the specific task.
This keeps the items in
keysas it is, and collect items inmeta_keysinto a meta item calledmeta_name.This is usually the last stage of the data loader pipeline. For example, when keys=’imgs’, meta_keys=(‘filename’, ‘label’, ‘original_shape’), meta_name=’img_meta’, the results will be a dict with keys ‘imgs’ and ‘img_meta’, where ‘img_meta’ is a DataContainer of another dict with keys ‘filename’, ‘label’, ‘original_shape’.- Parameters
keys (Sequence[str]) – Required keys to be collected.
meta_name (str) – The name of the key that contains meta infomation. This key is always populated. Default: “img_meta”.
meta_keys (Sequence[str]) –
Keys that are collected under meta_name. The contents of the
meta_namedictionary depends onmeta_keys. By default this includes:”filename”: path to the image file
”label”: label of the image file
”original_shape”: original shape of the image as a tuple
(h, w, c)
”img_shape”: shape of the image input to the network as a tuple
(h, w, c). Note that images may be zero padded on the bottom/right, if the batch tensor is larger than this shape.
”pad_shape”: image shape after padding
”flip_direction”: a str in (“horiziontal”, “vertival”) to
indicate if the image is fliped horizontally or vertically.
”img_norm_cfg”: a dict of normalization information:
mean - per channel mean subtraction
std - per channel std divisor
to_rgb - bool indicating if bgr was converted to rgb
-
class
mmaction.datasets.pipelines.Compose(transforms)[source]¶ Compose a data pipeline with a sequence of transforms.
- Parameters
transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.
-
class
mmaction.datasets.pipelines.DecordDecode(**kwargs)[source]¶ Using decord to decode the video.
Decord: https://github.com/dmlc/decord
Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs” and “original_shape”.
-
class
mmaction.datasets.pipelines.DecordInit(io_backend='disk', num_threads=1, **kwargs)[source]¶ Using decord to initialize the video_reader.
Decord: https://github.com/dmlc/decord
Required keys are “filename”, added or modified keys are “video_reader” and “total_frames”.
-
class
mmaction.datasets.pipelines.DenseSampleFrames(clip_len, frame_interval=1, num_clips=1, sample_range=64, num_sample_positions=10, temporal_jitter=False, out_of_bound_opt='loop', test_mode=False)[source]¶ Select frames from the video by dense sample strategy.
Required keys are “filename”, added or modified keys are “total_frames”, “frame_inds”, “frame_interval” and “num_clips”.
- Parameters
clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
sample_range (int) – Total sample range for dense sample. Default: 64.
num_sample_positions (int) – Number of sample start positions, Which is only used in test mode. Default: 10.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
-
class
mmaction.datasets.pipelines.Flip(flip_ratio=0.5, direction='horizontal', lazy=False)[source]¶ Flip the input images with a probability.
Reverse the order of elements in the given imgs with a specific direction. The shape of the imgs is preserved, but the elements are reordered. Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “lazy” and “flip_direction”. Required keys in “lazy” is None, added or modified key are “flip” and “flip_direction”.
- Parameters
flip_ratio (float) – Probability of implementing flip. Default: 0.5.
direction (str) – Flip imgs horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
class
mmaction.datasets.pipelines.FormatShape(input_format)[source]¶ Format final imgs shape to the given input_format.
Required keys are “imgs”, “num_clips” and “clip_len”, added or modified keys are “imgs” and “input_shape”.
- Parameters
input_format (str) – Define the final imgs format.
-
class
mmaction.datasets.pipelines.FrameSelector(*args, **kwargs)[source]¶ Deprecated class for
RawFrameDecode.
-
class
mmaction.datasets.pipelines.Fuse[source]¶ Fuse lazy operations.
- Fusion order:
crop -> resize -> flip
Required keys are “imgs”, “img_shape” and “lazy”, added or modified keys are “imgs”, “lazy”. Required keys in “lazy” are “crop_bbox”, “interpolation”, “flip_direction”.
-
class
mmaction.datasets.pipelines.GenerateLocalizationLabels[source]¶ Load video label for localizer with given video_name list.
Required keys are “duration_frame”, “duration_second”, “feature_frame”, “annotations”, added or modified keys are “gt_bbox”.
-
class
mmaction.datasets.pipelines.ImageToTensor(keys)[source]¶ Convert image type to torch.Tensor type.
- Parameters
keys (Sequence[str]) – Required keys to be converted.
-
class
mmaction.datasets.pipelines.LoadLocalizationFeature(raw_feature_ext='.csv')[source]¶ Load Video features for localizer with given video_name list.
Required keys are “video_name” and “data_prefix”, added or modified keys are “raw_feature”.
- Parameters
raw_feature_ext (str) – Raw feature file extension. Default: ‘.csv’.
-
class
mmaction.datasets.pipelines.LoadProposals(top_k, pgm_proposals_dir, pgm_features_dir, proposal_ext='.csv', feature_ext='.npy')[source]¶ Loading proposals with given proposal results.
Required keys are “video_name” added or modified keys are ‘bsp_feature’, ‘tmin’, ‘tmax’, ‘tmin_score’, ‘tmax_score’ and ‘reference_temporal_iou’.
- Parameters
top_k (int) – The top k proposals to be loaded.
pgm_proposals_dir (str) – Directory to load proposals.
pgm_features_dir (str) – Directory to load proposal features.
proposal_ext (str) – Proposal file extension. Default: ‘.csv’.
feature_ext (str) – Feature file extension. Default: ‘.npy’.
-
class
mmaction.datasets.pipelines.MultiGroupCrop(crop_size, groups)[source]¶ Randomly crop the images into several groups.
Crop the random region with the same given crop_size and bounding box into several groups. Required keys are “imgs”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.
- Parameters
crop_size (int | tuple[int]) – (w, h) of crop size.
groups (int) – Number of groups.
-
class
mmaction.datasets.pipelines.MultiScaleCrop(input_size, scales=(1), max_wh_scale_gap=1, random_crop=False, num_fixed_crops=5, lazy=False)[source]¶ Crop images with a list of randomly selected scales.
Randomly select the w and h scales from a list of scales. Scale of 1 means the base size, which is the minimal of image weight and height. The scale level of w and h is controlled to be smaller than a certain value to prevent too large or small aspect ratio. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox”, “img_shape”, “lazy” and “scales”. Required keys in “lazy” are “crop_bbox”, added or modified key is “crop_bbox”.
- Parameters
input_size (int | tuple[int]) – (w, h) of network input.
scales (tuple[float]) – Weight and height scales to be selected.
max_wh_scale_gap (int) – Maximum gap of w and h scale levels. Default: 1.
random_crop (bool) – If set to True, the cropping bbox will be randomly sampled, otherwise it will be sampler from fixed regions. Default: False.
num_fixed_crops (int) – If set to 5, the cropping bbox will keep 5 basic fixed regions: “upper left”, “upper right”, “lower left”, “lower right”, “center”.If set to 13, the cropping bbox will append another 8 fix regions: “center left”, “center right”, “lower center”, “upper center”, “upper left quarter”, “upper right quarter”, “lower left quarter”, “lower right quarter”. Default: 5.
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
class
mmaction.datasets.pipelines.Normalize(mean, std, to_bgr=False, adjust_magnitude=False)[source]¶ Normalize images with the given mean and std value.
Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs” and “img_norm_cfg”. If modality is ‘Flow’, additional keys “scale_factor” is required
- Parameters
mean (Sequence[float]) – Mean values of different channels.
std (Sequence[float]) – Std values of different channels.
to_bgr (bool) – Whether to convert channels from RGB to BGR. Default: False.
adjust_magnitude (bool) – Indicate whether to adjust the flow magnitude on ‘scale_factor’ when modality is ‘Flow’. Default: False.
-
class
mmaction.datasets.pipelines.OpenCVDecode[source]¶ Using OpenCV to decode the video.
Required keys are “video_reader”, “filename” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.
-
class
mmaction.datasets.pipelines.OpenCVInit(io_backend='disk', **kwargs)[source]¶ Using OpenCV to initalize the video_reader.
Required keys are “filename”, added or modified keys are “new_path”, “video_reader” and “total_frames”.
-
class
mmaction.datasets.pipelines.PyAVDecode(multi_thread=False)[source]¶ Using pyav to decode the video.
PyAV: https://github.com/mikeboers/PyAV
Required keys are “video_reader” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.
- Parameters
multi_thread (bool) – If set to True, it will apply multi thread processing. Default: False.
-
class
mmaction.datasets.pipelines.PyAVInit(io_backend='disk', **kwargs)[source]¶ Using pyav to initialize the video.
PyAV: https://github.com/mikeboers/PyAV
Required keys are “filename”, added or modified keys are “video_reader”, and “total_frames”.
- Parameters
io_backend (str) – io backend where frames are store. Default: ‘disk’.
kwargs (dict) – Args for file client.
-
class
mmaction.datasets.pipelines.RandomCrop(size, lazy=False)[source]¶ Vanilla square random crop that specifics the output size.
Required keys in results are “imgs” and “img_shape”, added or modified keys are “imgs”, “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.
- Parameters
size (int) – The output size of the images.
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
class
mmaction.datasets.pipelines.RandomResizedCrop(area_range=(0.08, 1.0), aspect_ratio_range=(0.75, 1.3333333333333333), lazy=False)[source]¶ Random crop that specifics the area and height-weight ratio range.
Required keys in results are “imgs”, “img_shape”, “crop_bbox” and “lazy”, added or modified keys are “imgs”, “crop_bbox” and “lazy”; Required keys in “lazy” are “flip”, “crop_bbox”, added or modified key is “crop_bbox”.
- Parameters
area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3).
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
static
get_crop_bbox(img_shape, area_range, aspect_ratio_range, max_attempts=10)[source]¶ Get a crop bbox given the area range and aspect ratio range.
- Parameters
img_shape (Tuple[int]) – Image shape
area_range (Tuple[float]) – The candidate area scales range of output cropped images. Default: (0.08, 1.0).
aspect_ratio_range (Tuple[float]) – The candidate aspect ratio range of output cropped images. Default: (3 / 4, 4 / 3). max_attempts (int): The maximum of attempts. Default: 10.
max_attempts (int) – Max attempts times to generate random candidate bounding box. If it doesn’t qualified one, the center bounding box will be used.
- Returns
(list[int]) A random crop bbox within the area range and aspect ratio range.
-
class
mmaction.datasets.pipelines.RawFrameDecode(io_backend='disk', decoding_backend='cv2', **kwargs)[source]¶ Load and decode frames with given indices.
Required keys are “frame_dir”, “filename_tmpl” and “frame_inds”, added or modified keys are “imgs”, “img_shape” and “original_shape”.
- Parameters
io_backend (str) – IO backend where frames are stored. Default: ‘disk’.
decoding_backend (str) – Backend used for image decoding. Default: ‘cv2’.
kwargs (dict, optional) – Arguments for FileClient.
-
class
mmaction.datasets.pipelines.Resize(scale, keep_ratio=True, interpolation='bilinear', lazy=False)[source]¶ Resize images to a specific size.
Required keys are “imgs”, “img_shape”, “modality”, added or modified keys are “imgs”, “img_shape”, “keep_ratio”, “scale_factor”, “lazy”, “resize_size”. Required keys in “lazy” is None, added or modified key is “interpolation”.
- Parameters
scale (float | Tuple[int]) – If keep_ratio is True, it serves as scaling factor or maximum size: If it is a float number, the image will be rescaled by this factor, else if it is a tuple of 2 integers, the image will be rescaled as large as possible within the scale. Otherwise, it serves as (w, h) of output size.
keep_ratio (bool) – If set to True, Images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: True.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear”. Default: “bilinear”.
lazy (bool) – Determine whether to apply lazy operation. Default: False.
-
class
mmaction.datasets.pipelines.SampleFrames(clip_len, frame_interval=1, num_clips=1, temporal_jitter=False, twice_sample=False, out_of_bound_opt='loop', test_mode=False, start_index=None)[source]¶ Sample frames from the video.
Required keys are “filename”, “total_frames”, “start_index” , added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.
- Parameters
clip_len (int) – Frames of each sampled output clip.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
num_clips (int) – Number of clips to be sampled. Default: 1.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
twice_sample (bool) – Whether to use twice sample when testing. If set to True, it will sample frames with and without fixed shift, which is commonly used for testing in TSM model. Default: False.
out_of_bound_opt (str) – The way to deal with out of bounds frame indexes. Available options are ‘loop’, ‘repeat_last’. Default: ‘loop’.
test_mode (bool) – Store True when building test or validation dataset. Default: False.
start_index (None) – This argument is deprecated and moved to dataset class (
BaseDataset,VideoDatset,RawframeDataset, etc), see this: https://github.com/open-mmlab/mmaction2/pull/89.
-
class
mmaction.datasets.pipelines.SampleProposalFrames(clip_len, body_segments, aug_segments, aug_ratio, frame_interval=1, test_interval=6, temporal_jitter=False, mode='train')[source]¶ Sample frames from proposals in the video.
Required keys are “total_frames” and “out_proposals”, added or modified keys are “frame_inds”, “frame_interval”, “num_clips”, ‘clip_len’ and ‘num_proposals’.
- Parameters
clip_len (int) – Frames of each sampled output clip.
body_segments (int) – Number of segments in course period.
aug_segments (list[int]) – Number of segments in starting and ending period.
aug_ratio (int | float | tuple[int | float]) – The ratio of the length of augmentation to that of the proposal.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 1.
test_interval (int) – Temporal interval of adjacent sampled frames in test mode. Default: 6.
temporal_jitter (bool) – Whether to apply temporal jittering. Default: False.
mode (str) – Choose ‘train’, ‘val’ or ‘test’ mode. Default: ‘train’.
-
class
mmaction.datasets.pipelines.TenCrop(crop_size)[source]¶ Crop the images into 10 crops (corner + center + flip).
Crop the four corners and the center part of the image with the same given crop_size, and flip it horizontally. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.
- Parameters
crop_size (int | tuple[int]) – (w, h) of crop size.
-
class
mmaction.datasets.pipelines.ThreeCrop(crop_size)[source]¶ Crop images into three crops.
Crop the images equally into three crops with equal intervals along the shorter side. Required keys are “imgs”, “img_shape”, added or modified keys are “imgs”, “crop_bbox” and “img_shape”.
- Parameters
crop_size (int | tuple[int]) – (w, h) of crop size.
-
class
mmaction.datasets.pipelines.ToDataContainer(fields)[source]¶ Convert the data to DataContainer.
- Parameters
fields (Sequence[dict]) – Required fields to be converted with keys and attributes. E.g. fields=(dict(key=’gt_bbox’, stack=False),).
-
class
mmaction.datasets.pipelines.ToTensor(keys)[source]¶ Convert some values in results dict to torch.Tensor type in data loader pipeline.
- Parameters
keys (Sequence[str]) – Required keys to be converted.
-
class
mmaction.datasets.pipelines.Transpose(keys, order)[source]¶ Transpose image channels to a given order.
- Parameters
keys (Sequence[str]) – Required keys to be converted.
order (Sequence[int]) – Image channel order.
-
class
mmaction.datasets.pipelines.UntrimmedSampleFrames(clip_len=1, frame_interval=16, start_index=1)[source]¶ Sample frames from the untrimmed video.
Required keys are “filename”, “total_frames”, added or modified keys are “frame_inds”, “frame_interval” and “num_clips”.
- Parameters
clip_len (int) – The length of sampled clips. Default: 1.
frame_interval (int) – Temporal interval of adjacent sampled frames. Default: 16.
start_index (int) – Specify a start index for frames in consideration of different filename format. However, when taking videos as input, it should be set to 0, since frames loaded from videos count from 0. Default: 1.
samplers¶
mmaction.utils¶
-
mmaction.utils.get_random_string(length=15)[source]¶ Get random string with letters and digits.
- Parameters
length (int) – Length of random string. Default: 15.
-
mmaction.utils.get_root_logger(log_file=None, log_level=20)[source]¶ Use
get_loggermethod in mmcv to get the root logger.The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If
log_fileis specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmaction”.- Parameters
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.
- Returns
The root logger.
- Return type
logging.Logger
mmaction.localization¶
-
mmaction.localization.eval_ap(detections, gt_by_cls, iou_range)[source]¶ Evaluate average precisions.
- Parameters
detections (dict) – Results of detections.
gt_by_cls (dict) – Information of groudtruth.
iou_range (list) – Ranges of iou.
- Returns
Average precision values of classes at ious.
- Return type
list
-
mmaction.localization.generate_bsp_feature(video_list, video_infos, tem_results_dir, pgm_proposals_dir, top_k=1000, bsp_boundary_ratio=0.2, num_sample_start=8, num_sample_end=8, num_sample_action=16, num_sample_interp=3, tem_results_ext='.csv', pgm_proposal_ext='.csv', result_dict=None)[source]¶ Generate Boundary-Sensitive Proposal Feature with given proposals.
- Parameters
video_list (list[int]) – List of video indexs to generate bsp_feature.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’.
tem_results_dir (str) – Directory to load temporal evaluation results.
pgm_proposals_dir (str) – Directory to load proposals.
top_k (int) – Number of proposals to be considered. Default: 1000
bsp_boundary_ratio (float) – Ratio for proposal boundary (start/end). Default: 0.2.
num_sample_start (int) – Num of samples for actionness in start region. Default: 8.
num_sample_end (int) – Num of samples for actionness in end region. Default: 8.
num_sample_action (int) – Num of samples for actionness in center region. Default: 16.
num_sample_interp (int) – Num of samples for interpolation for each sample point. Default: 3.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
pgm_proposal_ext (str) – File extension for proposals. Default: ‘.csv’.
result_dict (dict) – The dict to save the results. Default: None.
- Returns
A dict contains video_name as keys and bsp_feature as value. If result_dict is not None, save the results to it.
- Return type
bsp_feature_dict (dict)
-
mmaction.localization.generate_candidate_proposals(video_list, video_infos, tem_results_dir, temporal_scale, peak_threshold, tem_results_ext='.csv', result_dict=None)[source]¶ Generate Candidate Proposals with given temporal evalutation results. Each proposal file will contain: ‘tmin,tmax,tmin_score,tmax_score,score,match_iou,match_ioa’.
- Parameters
video_list (list[int]) – List of video indexs to generate proposals.
video_infos (list[dict]) – List of video_info dict that contains ‘video_name’, ‘duration_frame’, ‘duration_second’, ‘feature_frame’, and ‘annotations’.
tem_results_dir (str) – Directory to load temporal evaluation results.
temporal_scale (int) – The number (scale) on temporal axis.
peak_threshold (float) – The threshold for proposal generation.
tem_results_ext (str) – File extension for temporal evaluation model output. Default: ‘.csv’.
result_dict (dict) – The dict to save the results. Default: None.
- Returns
A dict contains video_name as keys and proposal list as value. If result_dict is not None, save the results to it.
- Return type
dict
-
mmaction.localization.load_localize_proposal_file(filename)[source]¶ Load the proposal file and split it into many parts which contain one video’s information separately.
- Parameters
filename (str) – Path to the proposal file.
- Returns
List of all videos’ information.
- Return type
list
-
mmaction.localization.perform_regression(detections)[source]¶ Perform regression on detection results.
- Parameters
detections (list) – Detection results before regression.
- Returns
Detection results after regression.
- Return type
list
-
mmaction.localization.soft_nms(proposals, alpha, low_threshold, high_threshold, top_k)[source]¶ Soft NMS for temporal proposals.
- Parameters
proposals (np.ndarray) – Proposals generated by network.
alpha (float) – Alpha value of Gaussian decaying function.
low_threshold (float) – Low threshold for soft nms.
high_threshold (float) – High threshold for soft nms.
top_k (int) – Top k values to be considered.
- Returns
The updated proposals.
- Return type
new_proposals (np.ndarray)
-
mmaction.localization.temporal_iop(proposal_min, proposal_max, gt_min, gt_max)[source]¶ Compute IoP score between a groundtruth bbox and the proposals.
Compute the IoP which is defined as the overlap ratio with groundtruth proportional to the duration of this proposal.
- Parameters
proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.
- Returns
List of intersection over anchor scores.
- Return type
scores (list[float])
-
mmaction.localization.temporal_iou(proposal_min, proposal_max, gt_min, gt_max)[source]¶ Compute IoU score between a groundtruth bbox and the proposals.
- Parameters
proposal_min (list[float]) – List of temporal anchor min.
proposal_max (list[float]) – List of temporal anchor max.
gt_min (float) – Groundtruth temporal box min.
gt_max (float) – Groundtruth temporal box max.
- Returns
List of iou scores.
- Return type
jaccard (list[float])