IMDB open data movies parsing attachment
Last update
2017-01-31
2017-01-31
« — »
Choose an ftp server from IMDB opendata, download movies.list.gz, then parse it with the useful regexp from imdb-data-parser:
1 2 3 4 5 6 7 8 9 10 | # captures: # 0: #TITLE (UNIQUE KEY) # 1: (.*? \(\S{4,}\)) movie name + year # 2: (\(\S+\)) type ex:(TV) # 3: (\{(.*?) ?(\(\S+?\))?\}) series info ex: {Ally Abroad (#3.1)} # 4: (.*?) episode name ex: Ally Abroad # 5: ((\(\S+?\)) episode number ex: (#3.1) # 6: (\{\{SUSPENDED\}\}) is suspended? # 7: (.*) year re = /((.*? \(\S{4,}\)) ?(\(\S+\))? ?(?!\{\{SUSPENDED\}\})(\{(.*?) ?(\(\S+?\))?\})? ?(\{\{SUSPENDED\}\})?)\t+(.*)$/ |
See attached script imdb_movies_dump.rb
to dump movie titles sorted by year in 2010-2016 range.