Cryptocurrency analysis and predictions using AI and big data

Discussion in 'CryptoCurrency' started by healzer, Jan 3, 2018.

  1. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,949
    Likes Received:
    2,932
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    @elavmunretea
    I love your input :D ! Thanks a lot.
    I do agree with you on most points, since most of those are already on my to-do list and some already under development.
    Unfortunately I haven't had the time to write every single thing I plan to do (and have already done).

    =========
    ========= Since Jan. 1 to Jan. 7, 2018
    =========

    Shortly after our nice findings I started working on a few additonal features:

    • I added some additional news sites to my data (this was a 10 min job lol).
    • I've tweaked and optimized the trendline functions, since they contained bugs here and there.
    • A lot (too much) time went into fixing bugs, new ones are popping up every day... well that's life, never give up! :D
    • Added the ability to visually move a certain line (to left, or to the right). This is particularly useful to better understand peaks and make up scenarions in my head of what could've happened at that time:
      [​IMG]

    • Sentiment analysis. I have done quite some reading (academic papers and professional advice) about this topic.
      If you don't know, sentiment analysis is to determine whether a certain text/article/tweet is Negative or Positive.
      The reality is that it is very complex, and I read this nice piece of advice:
      "A negative article/text can still yield positive economic benefits"

      There are two primary methods to do sentiment analysis: the hard way and the easy way.
      The hard way is to incorporate machine-learning (usually supervised) where we teach the computer whether X is negative/positive (this is difficult in practice).
      The easy way, is well, much simpler in my case. I was lucky enough to have found a list of thousands of words each labeled "neg" or "pos".
      But more importantly, these words are specifically chosen, labeled and created for stock-market analysis. And their results were slightly above a random pick -- so it will do for now :)

      So each tweet/post from twitter/reddit that came through (thousands every minute), I ran them through my sentiment-analysis algorithm that works like this:
      It detects and counts number of positive, and number of negative occurences of words. Then does #pos - #neg = delta.
      Finally the delta value is normalized and converted to percent % - so we can easily plot it on our graph together with Hype %.

      Here is a snippet of the data at some random time period I chose:
      [​IMG]
      The blue line is the Hype (social mentions).
      The dark-red line represents the Sentiments at each interval. Notice that these values can be negative, which indicates an average negative sentiment at a certain time period.
      I have drawn the red horizontal line to indicate that all above it is positive, and all below is negative.

      Notice that the red line is very similar to the Hype, which makes sense since it is derived from it.
      From these graphs we cannot make any conclusion, apart from indicating which periods has more "negative" than "positive" mentions and how the sentiment evolves.

      When we add the "BTC Price" graph then it really becomes a mess.
      I thought that sentiment follows the Hype graph, but sometimes often times doesn't.
      Below is a chart with BTC Price in black, and Sentiment in dark-red: (trendlines made of 4 consecutive points)
      [​IMG]
      It is one of the few regions I have seen where both have some kind of visual relationship.
      At the start of the graph, the price started to increase and so did the overall sentiment (but much steeper).
      After the first peak the price dropped and the sentiment followed but it looks like people weren't ready to give up their hope, a few hours later the price went up again.
      But what the heck do you think happened at 17:30?
     
  2. elavmunretea

    elavmunretea BANNED BANNED

    Joined:
    May 14, 2016
    Messages:
    2,220
    Likes Received:
    2,970
    Bitcoin was probably the wrong coin to chose for this, as the market cap is so high and volume is high no matter if the news is good/bad, as people use BTC to trade other cryptos.

    You would probably find more accurate results using a slightly smaller coin.
     
    • Thanks Thanks x 1
  3. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,949
    Likes Received:
    2,932
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    That is very true. :)
    Right now I need to finish several things before adding more coins, then I can do some more serious analysis.
    But until then...
     
  4. Seagate44

    Seagate44 Regular Member

    Joined:
    Aug 14, 2016
    Messages:
    304
    Likes Received:
    50
    For me, sooner or later, there would be media coverage about how f**d up situation bitcoin 'investors' are due to crysis/fraud/other made up bullsh**, what will create real HYPE, but with negative sentiment, which will cause a repeat of Dotcom bubble. I'm not really into math, graphs, data enginering etc. but what I learned in my - still short - life, is that if everyone go somewhere, then you shouldn't.

    You remember what Warren Buffet said -
    “Be Fearful When Others Are Greedy and Greedy When Others Are Fearful”

    For me, this is going to be a big headache for many 'investors' after all. And that's where I plan to make money though. To buy futures and bank when everybody start selling, and the price will drop as hell.
     
  5. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,949
    Likes Received:
    2,932
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    @Seagate44 We think very alike :)
    It's definitely not a business (from an investor's point of view), unless you own the "stock exchange".
    Crypto is nothing else than speculation, unless you have a system/plan that is proven to work with a >50% success rate.
     
  6. issorc

    issorc Power Member

    Joined:
    Feb 20, 2016
    Messages:
    769
    Likes Received:
    669
    Gender:
    Male
    Super interesting thread. How does the system look like (hardware like) where you run all your analysis? Are there any limitations based on Reddit's and Twitter's API? Good luck and I'll follow you closely. I guess it should be easy for you to implement more coins later which will be interesting because small coins most likely are more hype driven, looking forward to this.
     
  7. Seagate44

    Seagate44 Regular Member

    Joined:
    Aug 14, 2016
    Messages:
    304
    Likes Received:
    50
    There is no system/plan to work nowhere bro. Just my opinion. People are unpredictable. You can predict trends, often very clearly, but you just can't predict that Volkswagen would tomorrow turn as a fraud with diesel engines. You just can't know that unless you are in managmenet or something alike (look at a share price drop just before media covered the Volkswagen-gate. They started selling shares few weeks before the whole fraud leaked to public, so definitely people at the top knew what is going on).

    So, well, the only thing you could know for sure, is that people would always behave as a crowd, and if you don't follow it but play against the crowd, most likely you win.
     
    • Thanks Thanks x 1
  8. nix.feliks

    nix.feliks Junior Member

    Joined:
    Apr 12, 2013
    Messages:
    156
    Likes Received:
    59
    Hi.

    Great analysis, thanx for doing this. The whole process is more exciting than results, at least for me :)

    Maybe it would be useful to control for the number of overall mentions (hype), when plotting sentiment and price. When not controlling, you don't know what caused the rise or fall in sentiment, was it the total number of positive/negative words in mentions, or did the number of overall mentions simply rise or fall.

    So maybe you could just do new measure, calculate delta/hype to control for this, that would be easy to implement. And this way you could get more "pure" measure of sentiment.

    And perhaps you could look for positive and negative sentiment independently, and see how they relate to each other and the price. So, besides delta/hype, you could maybe plot positive/hype and negative/hype.

    I don't know whether this would bring any additional insight, just curious to see :)

    Anyways, great thread and thanx once again.
     
    • Thanks Thanks x 1
  9. Lothric

    Lothric Regular Member

    Joined:
    Apr 25, 2017
    Messages:
    204
    Likes Received:
    50
    very interesting project
    what programming language are you using? and what data analysis / plotting library are you using?
     
  10. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,949
    Likes Received:
    2,932
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    Thanks :)
    I'm running it on a single DigitalOcean droplet (just upgraded it to $40/mo plan).
    I love DO, they are also generally much cheaper than AWS.
    And once I start adding more coins I can simply distribute the work-load accross multiple droplets and have a cluster of worker-nodes.
    At this stage the whole setup uses 2 CPU cores and about 4GB of memory -- but I also have other stuff running there besides this.
    * there are limitations regarding the APIs, I just make sure I don't cross any limits.

    Thanks for the tips.
    I already tried those when designing the sentiments.
    When plotting neg + pos separately it makes even less sense.
    The sentiment does appear to be meaningful for partial areas, as explained above, but not all (hype is much more general and meaningful vs BTC's price).
    The approach I have now is pretty much recommended (from what I've learned), because 0% means neutral, 100% is most positive (in the data's range).
    I think I should follow @neu009 's advice on using Google NLP / IBM Watson and comparing their sentiment results against mine and see which ones appear to be better. :)

    I've mentioned this in my very first post: python, bash, php, chartJS for plotting.
    Hope it helps :)
     
  11. cnick79

    cnick79 Jr. VIP Jr. VIP

    Joined:
    Jun 10, 2010
    Messages:
    815
    Likes Received:
    417
    Location:
    Wandering
    The point of a project like the ops is not necessarily to predict trends but to stay on top of them as they start. Sure, you can’t predict when Volkswagen is going to have fraud charges against them, but if you can programmatically follow news and see that a lot of bad news is coming, you can stay ahead of the crowd and sell before things get worse. I do agree with warren buffet and buy when people are fearful but I don’t buy at the start of the day before things get bad.
     
    • Thanks Thanks x 1
  12. Seagate44

    Seagate44 Regular Member

    Joined:
    Aug 14, 2016
    Messages:
    304
    Likes Received:
    50
    And solution is within your post - follow news. :)
     
  13. bmanfacts

    bmanfacts Junior Member

    Joined:
    Jul 29, 2015
    Messages:
    187
    Likes Received:
    68
    Occupation:
    Blogging, Content Creating, Thinking, Doing.
    Location:
    Somewhere Intelligent
    This is the evolution of the the Crypto thread. Can we make things like this the main forum for crypto and let price predictions go elsewhere?

    I do have some thoughts and love engaging in discussions like these however my participation will have to wait a bit. I love what's going on here.

    Keep testing. I'm in agreement with the suggestions @elavmunretea just to sparknotes my thoughts on the subject thus far. Contributions will come soon. Subbed to this thread :)
     
    • Thanks Thanks x 2
  14. NIZAR BAMIDA

    NIZAR BAMIDA Newbie

    Joined:
    Jan 6, 2018
    Messages:
    15
    Likes Received:
    0
    Gender:
    Male
    THats great man keep it up
     
  15. zionbar

    zionbar Jr. VIP Jr. VIP

    Joined:
    Apr 6, 2015
    Messages:
    1,402
    Likes Received:
    814
    Gender:
    Male
    Occupation:
    Entrepreneur
    Location:
    Sunny Florida
    Thats interesting mate, thanks for the post. will appreciate it if you start sending us alerts based on the models you've developed :)
     
    • Thanks Thanks x 1
  16. healzer

    healzer Jr. VIP Jr. VIP

    Joined:
    Jun 26, 2011
    Messages:
    2,949
    Likes Received:
    2,932
    Gender:
    Male
    Location:
    Somewhere in Europe
    Home Page:
    That's a feature in our pipeline to auto-notify when something is about to go down.
    Until then stay tuned on my future updates :)

    ========
    ======== Jan 7, 2017
    ========

    Just pulled yet another all-nighter since I couldn't sleep due to certain bugs.
    This Apache Spark is killing me, it's the only component in my system that for some reason randomly freezes/fails without any logs/errors.
    I searched the entire internet, encountered a few people who reported the same problem, but not a single solution was present.
    Apache Spark is not that popular, since it's only used for very big data processing and analytics (by big corps such as eBay, ...).

    On the side I have also been playing around with Tensorflow (which appeared to be too complex for now) so I switched to Keras for machine learning and prediction making.
    I spent many hours trying to understand Keras, numpy array transformations, etc... but eventually cracked the code.
    In my case it is all about training a neural-network (the brain) by feeding it historical data.
    So then once you have a "brain" you can feed it new data and it will predict the new output.
    The idea is then to feed it the current price and current hype, so it will predict the hype and price of the next hour:

    [​IMG]
    The lower graph is my real data (black = BTC price).
    I trained the "brain" using its hourly data until the zone marked in red, that part I did not feed into the brain for training.
    Once the training phase was done, I inserted the red-zone, and it reproduced it pretty well as you can see in the top chart (green line).

    Over the course of next few days I'll be keep playing with Keras and see how I can optimize it to make above-average predictions for the near-future.
    I also have to figure out this Apache Spark thing. This morning I came with my n-th hypothesis of what may cause it to hang/freeze. Let's try it out :)

    Have a great day all!!
     
    • Thanks Thanks x 5
  17. mindmaster

    mindmaster Jr. VIP Jr. VIP

    Joined:
    Sep 16, 2010
    Messages:
    3,540
    Likes Received:
    1,608
    Home Page:
    Mate this is golden.

    I appreciate that you are sharing it with us so far.

    I'll keep on following.
     
    • Thanks Thanks x 1
  18. cnick79

    cnick79 Jr. VIP Jr. VIP

    Joined:
    Jun 10, 2010
    Messages:
    815
    Likes Received:
    417
    Location:
    Wandering
    Where did you get your historical data from?
     
  19. IamNRE

    IamNRE Jr. VIP Jr. VIP Premium Member

    Joined:
    Aug 18, 2010
    Messages:
    5,027
    Likes Received:
    7,413
    Occupation:
    Helping Small Businesses Get More Calls
    Home Page:
    Haha. What an awesome thread!

    Can't wait for the beta :D
     
    • Thanks Thanks x 1
  20. Unreliable Witness

    Unreliable Witness Regular Member

    Joined:
    Apr 21, 2016
    Messages:
    470
    Likes Received:
    242
    • Thanks Thanks x 1