IMDB open data movies parsing attachment
mouse 2054 · person cloud · link
Last update
2017-01-31
2017
01-31
« — »

Choose an ftp server from IMDB opendata, download movies.list.gz, then parse it with the useful regexp from imdb-data-parser:

1
2
3
4
5
6
7
8
9
10
# captures:
#   0: #TITLE (UNIQUE KEY)
#   1: (.*? \(\S{4,}\))                    movie name + year
#   2: (\(\S+\))                           type ex:(TV)
#   3: (\{(.*?) ?(\(\S+?\))?\})            series info ex: {Ally Abroad (#3.1)}
#   4: (.*?)                               episode name ex: Ally Abroad
#   5: ((\(\S+?\))                         episode number ex: (#3.1)
#   6: (\{\{SUSPENDED\}\})                 is suspended?
#   7: (.*)                                year
re = /((.*? \(\S{4,}\)) ?(\(\S+\))? ?(?!\{\{SUSPENDED\}\})(\{(.*?) ?(\(\S+?\))?\})? ?(\{\{SUSPENDED\}\})?)\t+(.*)$/

See attached script imdb_movies_dump.rb to dump movie titles sorted by year in 2010-2016 range.