| UCI | Collection of benchmark datasets for regression and classification tasks | UCI Machine Learning Repository |
| KDD | Extended version of UCI datasets | UCI KDD Extended Version |
| DELVE | Platform for comparative assessment of regression and classification tasks | DELVE |
| DMOZ | Collection of links for different datasets | DMOZ Directory |
| KDNuggets | collection of links for different datasets | Further Datasets |
| ChemDB | chemical data that can be used as datasets for machine learning | ChemDB |
| Golem | trying to learn rules for prediction | Golem Datasets |
| NDR | Data sets for nonlinear dimensionality reduction | Nonlinear Dimensionality Reduction |
| General | A list of dataset links by category | further datasets |
| AWS Public | public list of datasets via S3 | large dataset repository |
| Datahub | public list of datasets | datahub datasets |
| BigML | curated list of datasets | bigML datasets |
| Curated Github | curated categorized list of datasets on github | public datasets on github |
| wikipedia list | curated categorized list of datasets on wikipedia | datasets of ML |
| Data Science | Data Science Projects | 19 free public data sources |
| Data Science | Data Science Projects | data science datasets |