Manage datasets

In this section we will see how we can upload custom datasets in Deep Learning Studio

How to add datasets in Deep Learning Studio (DLS)

There are 2 ways to upload the dataset

1) From Datasets tab

2) From File browser

  • Web-based

  • Native browser

1) How to add dataset using "Datasets tab"

  • Click on "Datasets" tab from the left navigation bar

  • Click on upload icon from top-right to upload a dataset

  • When you click on upload icon, it will shows pop-up console where you need to drag/drop the zipped folder of the dataset

Important!

  • Make sure you have zipped the dataset folder before uploading it.

  • Upload zipped dataset is < 1GB

  • Drag/Drop the zipped folder to the datasets canvas

  • Select dataset format by clicking on the drop-down button

  • Dataset format has three formats:

    • DLS Native

    • Image Folder Dataset

    • MS COCO Dataset

  • Click on Start Upload

    • It may take a few seconds to get complete. (Depending on the size of the dataset)

2) How to upload dataset from "File Browser"

1) How to upload datasets using "Native file browser"

  • Click on "File Browser" from left navigation and "Native", this will open file explorer (<DLS Folder>/user_data/1 /)

  • Open the "dataset" folder

  • Create/Copy/Move your custom dataset

  • Download dataset_config.yaml and place it in your dataset folder

dataset_format : <dataset format>
source : Upload

You may need to modify this config file, update the dataset_format in data_config.yaml

Here we will see which dataset format need to be select for custom dataset

Upload Instructions

  • Use DLS Dataset tab for <1GB zipped dataset (recommended)‌

  • Use DLS "DLS Native" File Browser for datasets >1GB.

Dataset Formats

1. DLS Native

  1. Folder format dataset for image classification

    root/class_x/xxx.ext
    root/class_x/xxy.ext
    root/class_x/xxz.ext
    root/class_y/123.ext
    root/class_y/nsdf3.ext
    root/class_y/asd932_.ext

  2. CSV file file name should be train.csv having two or more columns

    e.g. - create imdb like dataset

  • text : Encode the text as a string of semicolon-separated numbers. Pad as needed to maintain a fixed length of the sequence.

  • label: rating 1

You can also refer to How to Videos to upload the DLS Native Custom dataset.

In Deep Learning Studio, the DLS Native format dataset can only be used for Custom Neural Network project types.

2. Image folder format

  • This folder dataset only contains images.

A dataset for loading image files stored in a folder structure.

root
├── test
│ ├── brick
│ │ ├── brick_001968.jpg
│ │ └── brick_001981.jpg
│ ├── water
│ │ ├── water_002256.jpg
│ │ └── water_002296.jpg
│ └── wood
│ ├── wood_000770.jpg
│ └── wood_000793.jpg
├── train
│ ├── brick
│ │ ├── brick_000593.jpg
│ │ └── brick_002089.jpg
│ ├── carpet
│ │ ├── carpet_002084.jpg
│ │ └── carpet_002375.jpg
│ └── wood
│ ├── wood_002278.jpg
│ └── wood_002391.jpg
└── val
├── brick
│ ├── brick_000168.jpg
│ └── brick_002137.jpg
├── water
│ ├── water_000792.jpg
│ └── water_000797.jpg
└── wood
├── wood_001844.jpg
└── wood_002146.jpg

In Deep Learning Studio, the Image Folder format dataset can only be used for the AI APP Module Classification project type.

3. MS COCO Dataset

  • MS COCO Dataset contains 2 files:

  1. Image folder (which contains images)

  2. Annotations folder (which contain 2 JSON annotation file of images)

├── annotations
│ ├── instances_train2017.json
│ └── instances_val2017.json
└── images
├── 000000000074.jpg
├── 000000000109.jpg
├── 000000008458.jpg
├── 000000008781.jpg
├── 000000008787.jpg
├── 000000008821.jpg
├── 000000016775.jpg
├── 000000016957.jpg
├── 000000024664.jpg
├── 000000024861.jpg
├── 000000024935.jpg
├── 000000025148.jpg
├── 000000025234.jpg
├── 000000033325.jpg
├── 000000033377.jpg
├── 000000033405.jpg
├── 000000033444.jpg
├── 000000041311.jpg
├── 000000041552.jpg
├── 000000041568.jpg
├── 000000049814.jpg
├── 000000052891.jpg
└── 000000581654.jpg

In Deep Learning Studio, the MS COCO format dataset can only be used for the AI APP Module Segmentation project type.

CVAT Dataset

  • You can refer to the CVAT Dataset link for detailed information.